For the final exam/project we will develop classification models using several approaches and compare their performance on a new dataset – so-called “Census Income” from UCI ML. It is available at UCI ML web site, but so that we are not at the mercy of UCI ML availability, there is also a local copy of it in our website in Canvas as a zip-archive of all associated files. Among other things, the description for this dataset also presents performance (prediction accuracy) observed by the dataset providers using variety of modeling techniques – this supplies a context for the errors of the models we will develop here.
Please note that the original data has been split up into training and test subsets, but there doesn’t seem to be anything particular about that split, so we might want to pool those two datasets together and split them into training and test as necessary ourselves. As you do that, please check that the attribute levels are consistent between those two files. For instance, the categorized income levels are indicated using slightly different notation in their training and test data. By now it should be quite straightforward for you to correct that when you pool them together.
Also, please note that there is non-negligible number of rows with missing values that for most analyses cannot be included without modification in the computation. Please decide how you want to handle them and proceed accordingly. The simplest and perfectly acceptable approach would be to exclude those observations from the rest of the analyses, but if you have time and inclination to investigate the impact of imputing them by various means, you are welcome to try.
Attribute called “final weight” in the dataset description represents demographic weighting of these observations. Please disregard it for the purposes of this assignment.
Additionally, several attributes in this dataset are categorical variables with more than two levels (e.g. native country, occupation, etc.). Please make sure to translate them into corresponding sets of dummy indicator variables for the methods that require such conversion (e.g. PCA) – R function model.matrix can be convenient for this, instead of generating those 0/1 indicators for each level of the factor manually (which is still perfectly fine). Some of those multi-level factors contain very sparsely populated categories – e.g. occupation “Armed-Forces” or work class “Never-worked” – it is your call whether you want to keep those observations in the data or exclude also on the basis that there is not enough data to adequately capture the impact of those categories. Feel free to experiment away!
Among the multi-level categorical attributes, native country attribute has the largest number of levels – several folds higher than any other attribute in this dataset – some of which have relatively few observations. This associated increase in dimensionality of the data may not be accompanied by a corresponding gain of resolution – e.g. would we expect this data to support the difference in income between descendants from Peru and Nicaragua, for example, or from Cambodia and Laos? Please feel free to evaluate the impact of inclusion and/or omission of this attribute in/from the model and/or discretizing it differently (e.g. US/non-US, etc.).
Lastly, the size of this dataset can make some of the modeling techniques run slower than what we were typically encountering in this class. You may find it helpful to do some of the exploration and model tuning on multiple random samples of smaller size as you decide on useful ranges of parameters/modeling choices, and then only perform a final run of fully debugged and working code on the full dataset.
#Prepare the data
# combine both the datasets
setwd("/Users/RaviRani/Documents/Harvard-Extension/CSCI E-63/finalexam")
traindata<-read.table("adult.data.1",sep=",",header=FALSE,quote="",stringsAsFactors=TRUE)
ncol(traindata)
## [1] 15
colnames(traindata) <- c("age","workclass","fnlwgt","education","education_num","marital_status","occupation","relationship","race","sex","capital_gain","capital_loss","hours_per_week","native_country","salary")
testdata<-read.table("adult.test",sep=",",header=FALSE,quote="",stringsAsFactors=TRUE)
colnames(testdata) <- c("age","workclass","fnlwgt","education","education_num","marital_status","occupation","relationship","race","sex","capital_gain","capital_loss","hours_per_week","native_country","salary")
ncol(testdata)
## [1] 15
testdata$salary = ifelse(grepl("( <=50K.)",testdata$salary)," <=50K"," >50K")
#testdata$salary[testdata$salary == " <=50K."]<-" <=50K"
#testdata[salary == " >50K."]=" >50K"
#head(traindata)
#head(testdata)
# remove 'final weight' attribute
merged.data <- rbind(traindata[,-3], testdata[,-3])
table(merged.data$salary)
##
## <=50K >50K
## 37154 11687
nrow(traindata)
## [1] 32560
nrow(testdata)
## [1] 16281
nrow(merged.data)
## [1] 48841
class(merged.data)
## [1] "data.frame"
head(merged.data)
## age workclass education education_num marital_status
## 1 39 State-gov Bachelors 13 Never-married
## 2 50 Self-emp-not-inc Bachelors 13 Married-civ-spouse
## 3 38 Private HS-grad 9 Divorced
## 4 53 Private 11th 7 Married-civ-spouse
## 5 28 Private Bachelors 13 Married-civ-spouse
## 6 37 Private Masters 14 Married-civ-spouse
## occupation relationship race sex capital_gain
## 1 Adm-clerical Not-in-family White Male 2174
## 2 Exec-managerial Husband White Male 0
## 3 Handlers-cleaners Not-in-family White Male 0
## 4 Handlers-cleaners Husband Black Male 0
## 5 Prof-specialty Wife Black Female 0
## 6 Exec-managerial Wife White Female 0
## capital_loss hours_per_week native_country salary
## 1 0 40 United-States <=50K
## 2 0 13 United-States <=50K
## 3 0 40 United-States <=50K
## 4 0 40 United-States <=50K
## 5 0 40 Cuba <=50K
## 6 0 40 United-States <=50K
merged.data[merged.data == " ?"]=NA
merged.data$native_country<-factor(merged.data$native_country)
merged.data$workclass<-factor(merged.data$workclass)
merged.data$occupation<-factor(merged.data$occupation)
merged.data$occupation<-factor(merged.data$occupation)
#class(salary)
#attach(merged.data)
# after removing "?" with NA
head(merged.data)
## age workclass education education_num marital_status
## 1 39 State-gov Bachelors 13 Never-married
## 2 50 Self-emp-not-inc Bachelors 13 Married-civ-spouse
## 3 38 Private HS-grad 9 Divorced
## 4 53 Private 11th 7 Married-civ-spouse
## 5 28 Private Bachelors 13 Married-civ-spouse
## 6 37 Private Masters 14 Married-civ-spouse
## occupation relationship race sex capital_gain
## 1 Adm-clerical Not-in-family White Male 2174
## 2 Exec-managerial Husband White Male 0
## 3 Handlers-cleaners Not-in-family White Male 0
## 4 Handlers-cleaners Husband Black Male 0
## 5 Prof-specialty Wife Black Female 0
## 6 Exec-managerial Wife White Female 0
## capital_loss hours_per_week native_country salary
## 1 0 40 United-States <=50K
## 2 0 13 United-States <=50K
## 3 0 40 United-States <=50K
## 4 0 40 United-States <=50K
## 5 0 40 Cuba <=50K
## 6 0 40 United-States <=50K
# remove rows with NA's - we will be using this data set for our calculations
noNAData=na.omit(merged.data)
noNAData$native_country<-factor(noNAData$native_country)
noNAData$workclass<-factor(noNAData$workclass)
noNAData$occupation<-factor(noNAData$occupation)
noNAData$occupation<-factor(noNAData$occupation)
#Normalize the numeric variables
num.vars <- sapply(noNAData, is.numeric)
noNAData[num.vars] <- lapply(noNAData[num.vars], scale)
missmap(noNAData, main = "Missing values vs observed")
attach(noNAData)
is.factor(workclass)
## [1] TRUE
is.factor(race)
## [1] TRUE
is.factor(sex)
## [1] TRUE
is.factor(marital_status)
## [1] TRUE
is.factor(occupation)
## [1] TRUE
is.factor(education)
## [1] TRUE
is.factor(relationship)
## [1] TRUE
contrasts(workclass)
## Local-gov Private Self-emp-inc Self-emp-not-inc
## Federal-gov 0 0 0 0
## Local-gov 1 0 0 0
## Private 0 1 0 0
## Self-emp-inc 0 0 1 0
## Self-emp-not-inc 0 0 0 1
## State-gov 0 0 0 0
## Without-pay 0 0 0 0
## State-gov Without-pay
## Federal-gov 0 0
## Local-gov 0 0
## Private 0 0
## Self-emp-inc 0 0
## Self-emp-not-inc 0 0
## State-gov 1 0
## Without-pay 0 1
contrasts(race)
## Asian-Pac-Islander Black Other White
## Amer-Indian-Eskimo 0 0 0 0
## Asian-Pac-Islander 1 0 0 0
## Black 0 1 0 0
## Other 0 0 1 0
## White 0 0 0 1
contrasts(sex)
## Male
## Female 0
## Male 1
contrasts(marital_status)
## Married-AF-spouse Married-civ-spouse
## Divorced 0 0
## Married-AF-spouse 1 0
## Married-civ-spouse 0 1
## Married-spouse-absent 0 0
## Never-married 0 0
## Separated 0 0
## Widowed 0 0
## Married-spouse-absent Never-married Separated
## Divorced 0 0 0
## Married-AF-spouse 0 0 0
## Married-civ-spouse 0 0 0
## Married-spouse-absent 1 0 0
## Never-married 0 1 0
## Separated 0 0 1
## Widowed 0 0 0
## Widowed
## Divorced 0
## Married-AF-spouse 0
## Married-civ-spouse 0
## Married-spouse-absent 0
## Never-married 0
## Separated 0
## Widowed 1
contrasts(occupation)
## Armed-Forces Craft-repair Exec-managerial
## Adm-clerical 0 0 0
## Armed-Forces 1 0 0
## Craft-repair 0 1 0
## Exec-managerial 0 0 1
## Farming-fishing 0 0 0
## Handlers-cleaners 0 0 0
## Machine-op-inspct 0 0 0
## Other-service 0 0 0
## Priv-house-serv 0 0 0
## Prof-specialty 0 0 0
## Protective-serv 0 0 0
## Sales 0 0 0
## Tech-support 0 0 0
## Transport-moving 0 0 0
## Farming-fishing Handlers-cleaners Machine-op-inspct
## Adm-clerical 0 0 0
## Armed-Forces 0 0 0
## Craft-repair 0 0 0
## Exec-managerial 0 0 0
## Farming-fishing 1 0 0
## Handlers-cleaners 0 1 0
## Machine-op-inspct 0 0 1
## Other-service 0 0 0
## Priv-house-serv 0 0 0
## Prof-specialty 0 0 0
## Protective-serv 0 0 0
## Sales 0 0 0
## Tech-support 0 0 0
## Transport-moving 0 0 0
## Other-service Priv-house-serv Prof-specialty
## Adm-clerical 0 0 0
## Armed-Forces 0 0 0
## Craft-repair 0 0 0
## Exec-managerial 0 0 0
## Farming-fishing 0 0 0
## Handlers-cleaners 0 0 0
## Machine-op-inspct 0 0 0
## Other-service 1 0 0
## Priv-house-serv 0 1 0
## Prof-specialty 0 0 1
## Protective-serv 0 0 0
## Sales 0 0 0
## Tech-support 0 0 0
## Transport-moving 0 0 0
## Protective-serv Sales Tech-support Transport-moving
## Adm-clerical 0 0 0 0
## Armed-Forces 0 0 0 0
## Craft-repair 0 0 0 0
## Exec-managerial 0 0 0 0
## Farming-fishing 0 0 0 0
## Handlers-cleaners 0 0 0 0
## Machine-op-inspct 0 0 0 0
## Other-service 0 0 0 0
## Priv-house-serv 0 0 0 0
## Prof-specialty 0 0 0 0
## Protective-serv 1 0 0 0
## Sales 0 1 0 0
## Tech-support 0 0 1 0
## Transport-moving 0 0 0 1
contrasts(education)
## 11th 12th 1st-4th 5th-6th 7th-8th 9th Assoc-acdm
## 10th 0 0 0 0 0 0 0
## 11th 1 0 0 0 0 0 0
## 12th 0 1 0 0 0 0 0
## 1st-4th 0 0 1 0 0 0 0
## 5th-6th 0 0 0 1 0 0 0
## 7th-8th 0 0 0 0 1 0 0
## 9th 0 0 0 0 0 1 0
## Assoc-acdm 0 0 0 0 0 0 1
## Assoc-voc 0 0 0 0 0 0 0
## Bachelors 0 0 0 0 0 0 0
## Doctorate 0 0 0 0 0 0 0
## HS-grad 0 0 0 0 0 0 0
## Masters 0 0 0 0 0 0 0
## Preschool 0 0 0 0 0 0 0
## Prof-school 0 0 0 0 0 0 0
## Some-college 0 0 0 0 0 0 0
## Assoc-voc Bachelors Doctorate HS-grad Masters
## 10th 0 0 0 0 0
## 11th 0 0 0 0 0
## 12th 0 0 0 0 0
## 1st-4th 0 0 0 0 0
## 5th-6th 0 0 0 0 0
## 7th-8th 0 0 0 0 0
## 9th 0 0 0 0 0
## Assoc-acdm 0 0 0 0 0
## Assoc-voc 1 0 0 0 0
## Bachelors 0 1 0 0 0
## Doctorate 0 0 1 0 0
## HS-grad 0 0 0 1 0
## Masters 0 0 0 0 1
## Preschool 0 0 0 0 0
## Prof-school 0 0 0 0 0
## Some-college 0 0 0 0 0
## Preschool Prof-school Some-college
## 10th 0 0 0
## 11th 0 0 0
## 12th 0 0 0
## 1st-4th 0 0 0
## 5th-6th 0 0 0
## 7th-8th 0 0 0
## 9th 0 0 0
## Assoc-acdm 0 0 0
## Assoc-voc 0 0 0
## Bachelors 0 0 0
## Doctorate 0 0 0
## HS-grad 0 0 0
## Masters 0 0 0
## Preschool 1 0 0
## Prof-school 0 1 0
## Some-college 0 0 1
contrasts(relationship)
## Not-in-family Other-relative Own-child Unmarried Wife
## Husband 0 0 0 0 0
## Not-in-family 1 0 0 0 0
## Other-relative 0 1 0 0 0
## Own-child 0 0 1 0 0
## Unmarried 0 0 0 1 0
## Wife 0 0 0 0 1
# Take a back up
noNAData.bk<-noNAData
#data frame with factors converted into numeric
noNAData.num<-noNAData
noNAData.num[,'workclass']=as.numeric(as.integer(as.factor(noNAData[,'workclass'])))
noNAData.num[,'education']=as.numeric(as.integer(as.factor(noNAData[,'education'])))
noNAData.num[,'marital_status']=as.numeric(as.integer(as.factor(noNAData[,'marital_status'])))
noNAData.num[,'occupation']=as.numeric(as.integer(as.factor(noNAData[,'occupation'])))
noNAData.num[,'relationship']=as.numeric(as.integer(as.factor(noNAData[,'relationship'])))
noNAData.num[,'race']=as.numeric(as.character(as.integer(noNAData[,'race'])))
noNAData.num[,'sex']=as.numeric(as.character(as.integer(noNAData[,'sex'])))
noNAData.num[,'native_country']=as.numeric(as.integer(as.factor(noNAData[,'native_country'])))
The above code prepares the data for analysis below. first we read data from both data sets adult.data and adult.test . then the data is merged to a data frame. Then NA’s are removed from the data. Then the fnlwgt column is removed based on the preface comments. The a test is made to check whether there are any empty values in the data frame.
Download and read “Census Income” data into R and prepare graphical and numerical summaries of it: e.g. histograms of continuous attributes, contingency tables of categorical variables, scatterplots of continuous attributes with some of the categorical variables indicated by color/symbol shape, etc. Perform principal components analysis of this data (do you need to scale it prior to that? how would you represent multilevel categorical attributes to be used as inputs for PCA?) and plot observations in the space of the first few principal components with subjects’ gender and/or categorized income indicated by color/shape of the symbol. Perform univariate assessment of associations between outcome we will be modeling and each of the attributes (e.g. t-test or logistic regression for continuous attributes, contingency tables/Fisher exact test/\(\chi^2\) test for categorical attributes). Summarize your observations from these assessments: does it appear that there is association between outcome and predictors? Which predictors seem to be more/less relevant?
The continous attributes are: age,education-num,capital-gain,capital-loss,and hours-per-week and categorical attributes are: workclass,education,marital-status,occupation,relationship,race,sex,and native-country
Now we will draw histograms of continuous attributes, contingency tables of categorical variables.
# analyze raw data
qplot(age, geom="histogram",na.rm = TRUE)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
qplot(education_num, geom="histogram",na.rm = TRUE)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
qplot(capital_gain, geom="histogram",na.rm = TRUE)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
qplot(capital_loss, geom="histogram",na.rm = TRUE)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
qplot(hours_per_week, geom="histogram",na.rm = TRUE)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The histogram plots above of continous attributes tells that the age is in the range of 5 to 90 years. The capital gain and capital loss are 0 in most cases.Most of the people work 40 hours in a week. Now we will do contingency table for categorical attributes.
table(sex)
## sex
## Female Male
## 14694 30527
table(education)
## education
## 10th 11th 12th 1st-4th 5th-6th
## 1223 1619 577 222 449
## 7th-8th 9th Assoc-acdm Assoc-voc Bachelors
## 823 676 1507 1959 7570
## Doctorate HS-grad Masters Preschool Prof-school
## 544 14783 2514 72 785
## Some-college
## 9898
table(workclass)
## workclass
## Federal-gov Local-gov Private Self-emp-inc
## 1406 3100 33306 1646
## Self-emp-not-inc State-gov Without-pay
## 3796 1946 21
table(marital_status)
## marital_status
## Divorced Married-AF-spouse Married-civ-spouse
## 6297 32 21055
## Married-spouse-absent Never-married Separated
## 552 14597 1411
## Widowed
## 1277
table(occupation)
## occupation
## Adm-clerical Armed-Forces Craft-repair
## 5540 14 6020
## Exec-managerial Farming-fishing Handlers-cleaners
## 5984 1480 2046
## Machine-op-inspct Other-service Priv-house-serv
## 2969 4808 232
## Prof-specialty Protective-serv Sales
## 6008 976 5408
## Tech-support Transport-moving
## 1420 2316
table(relationship)
## relationship
## Husband Not-in-family Other-relative Own-child
## 18666 11702 1348 6626
## Unmarried Wife
## 4788 2091
table(race)
## race
## Amer-Indian-Eskimo Asian-Pac-Islander Black
## 435 1303 4228
## Other White
## 353 38902
The above contingency table shows the distribution of number of observations across various categories. Next we will do the scatter plots of some continous attributes with categorical attributes
# The following scatterplot will do the plot of education and education_num
ggplot(merged.data, aes(x=education, y=education_num, shape=education, color=sex)) +
geom_point()+scale_shape_manual(values=seq(0,15))
# The following scatterplot will do the plot of hours_per_week and education_num
ggplot(merged.data, aes(x=education_num, y=hours_per_week, color=education)) +
geom_point()
# The following scatterplot will do the plot of hours_per_week and marital_status
ggplot(merged.data, aes(x=marital_status, y=hours_per_week, color=marital_status)) +
geom_point()
# The following scatterplot will do the plot of capital_gain and education categorized by sex
ggplot(merged.data, aes(x=capital_gain, y=education, color=sex)) +
geom_point()
# The regression model doen below will show the correlation between the response variable with the independent variable(s)
#summary(lm(as.numeric(salary)~.,data=merged.data,na.action=na.omit))
#outModel<-model.matrix(~ sex + education+workclass, data=merged.data, contrasts.arg=list(sex=diag(nlevels(sex)), education=diag(nlevels(education)),workclass=diag(nlevels(workclass)),marital_status=diag(nlevels(marital_status)),occupation=diag(nlevels(occupation)),relationship=diag(nlevels(relationship)),race=diag(nlevels(race)),native_country=diag(nlevels(native_country))))
#PCA rendition of untransformed data
modelOut<-model.matrix(salary ~ ., data = merged.data)
pca.out<-prcomp(modelOut[ , apply(modelOut, 2, var) != 0],na.rm = TRUE,scale=T)
#pca.out<-prcomp(model.matrix(salary ~ ., data = merged.data),na.rm = TRUE)
#pca.out
#center and scale refers to respective mean and standard deviation of the variables that are used for normalization prior to implementing PCA
#outputs the mean of variables
#pca.out$center
#outputs the standard deviation of variables
#pca.out$scale
#rotation measure provides the principal component loading. Each column of rotation matrix contains the principal component loading vector.
#pca.out$rotation
#compute the principal component score vector
dim(pca.out$x)
## [1] 45221 94
biplot(pca.out, scale = 0)
# plot of PCA results for PC1 & PC2
plot(pca.out$x[,1:2])
#Attributes of PC1 in decreasing order
sort(pca.out$rotation[,1]^2,decreasing=TRUE,n=10)
## marital_status Married-civ-spouse
## 0.12100919885903
## marital_status Never-married
## 0.09572314053910
## education_num
## 0.09337405025801
## age
## 0.06515874947188
## relationship Own-child
## 0.06144529658745
## workclass Private
## 0.06091603349574
## hours_per_week
## 0.04808387377377
## occupation Prof-specialty
## 0.03677811283382
## occupation Other-service
## 0.03531619989403
## sex Male
## 0.03529978662821
## education Bachelors
## 0.02459180553456
## education Masters
## 0.02255568018513
## occupation Exec-managerial
## 0.02253853420721
## race White
## 0.02252612874413
## workclass Self-emp-not-inc
## 0.02058712821548
## race Black
## 0.02052976885101
## workclass Self-emp-inc
## 0.01942700192919
## education Prof-school
## 0.01605850012832
## capital_gain
## 0.01263615083136
## education 11th
## 0.01186491384772
## education Doctorate
## 0.01057698940970
## occupation Handlers-cleaners
## 0.00993882465299
## relationship Unmarried
## 0.00980569140425
## native_country United-States
## 0.00956512200188
## relationship Other-relative
## 0.00944614789949
## native_country Mexico
## 0.00862008068027
## workclass Local-gov
## 0.00848714237035
## education HS-grad
## 0.00846032225258
## relationship Not-in-family
## 0.00841755607441
## capital_loss
## 0.00710855260751
## occupation Machine-op-inspct
## 0.00612297248830
## education 5th-6th
## 0.00508876482106
## education Some-college
## 0.00441212972034
## workclass State-gov
## 0.00408633873808
## marital_status Separated
## 0.00399939738249
## relationship Wife
## 0.00379152235034
## education 12th
## 0.00344299663247
## occupation Priv-house-serv
## 0.00343209168229
## education 9th
## 0.00302496996257
## education 1st-4th
## 0.00290182694383
## race Other
## 0.00276922143725
## native_country El-Salvador
## 0.00210076923169
## education Preschool
## 0.00156687236331
## marital_status Married-spouse-absent
## 0.00153866700923
## native_country Guatemala
## 0.00138905837853
## native_country Jamaica
## 0.00126641311593
## native_country Dominican-Republic
## 0.00121606660867
## occupation Protective-serv
## 0.00118817263750
## education 7th-8th
## 0.00117500190304
## native_country Haiti
## 0.00102087930751
## native_country Puerto-Rico
## 0.00093066160700
## marital_status Widowed
## 0.00070749976192
## occupation Farming-fishing
## 0.00064837501097
## native_country Vietnam
## 0.00059069316107
## native_country Philippines
## 0.00053197777105
## education Assoc-voc
## 0.00052937765618
## race Asian-Pac-Islander
## 0.00052073556388
## education Assoc-acdm
## 0.00046313973386
## native_country Nicaragua
## 0.00035422153742
## native_country Columbia
## 0.00020785074890
## occupation Craft-repair
## 0.00019004500653
## native_country Trinadad&Tobago
## 0.00017026048074
## native_country Peru
## 0.00015662803944
## occupation Sales
## 0.00015074115200
## native_country Ecuador
## 0.00015060499127
## native_country India
## 0.00014989241861
## native_country Honduras
## 0.00013518036823
## native_country Taiwan
## 0.00013414416108
## native_country Laos
## 0.00012113784026
## native_country Portugal
## 0.00011843875804
## native_country Iran
## 0.00011275413845
## native_country Outlying-US(Guam-USVI-etc)
## 0.00010380241843
## native_country Greece
## 0.00009941741569
## native_country Canada
## 0.00007309876965
## native_country France
## 0.00005355600938
## native_country England
## 0.00004079613305
## native_country Ireland
## 0.00003040508972
## native_country Hungary
## 0.00002612568436
## native_country Thailand
## 0.00002107465398
## native_country Poland
## 0.00001640129549
## native_country South
## 0.00001610077479
## native_country Germany
## 0.00001521433284
## native_country Hong
## 0.00001294478503
## occupation Tech-support
## 0.00001203019074
## occupation Transport-moving
## 0.00000868826311
## native_country China
## 0.00000489630559
## occupation Armed-Forces
## 0.00000279644699
## native_country Scotland
## 0.00000253223954
## native_country Italy
## 0.00000189433954
## native_country Cuba
## 0.00000152235966
## marital_status Married-AF-spouse
## 0.00000104279005
## workclass Without-pay
## 0.00000044248292
## native_country Yugoslavia
## 0.00000019940312
## native_country Japan
## 0.00000004102641
#Attributes of PC2 in decreasing order
sort(pca.out$rotation[,2]^2,decreasing=TRUE,n=10)
## education_num
## 0.153675547109
## marital_status Married-civ-spouse
## 0.092078036806
## marital_status Never-married
## 0.088055140710
## native_country United-States
## 0.054495939716
## occupation Prof-specialty
## 0.050281611464
## sex Male
## 0.049819305110
## native_country Mexico
## 0.045887001723
## education Bachelors
## 0.042088471867
## relationship Not-in-family
## 0.041526113041
## education HS-grad
## 0.034926559397
## age
## 0.034309836719
## occupation Craft-repair
## 0.033204048328
## education 5th-6th
## 0.031625974086
## relationship Own-child
## 0.028444045368
## education 7th-8th
## 0.024325703148
## education 1st-4th
## 0.018482173210
## education Masters
## 0.016467539944
## occupation Machine-op-inspct
## 0.014127625405
## occupation Farming-fishing
## 0.014000280274
## occupation Transport-moving
## 0.012804861940
## education 9th
## 0.010922920543
## workclass State-gov
## 0.009546828386
## workclass Self-emp-not-inc
## 0.008219859768
## hours_per_week
## 0.007931369023
## workclass Local-gov
## 0.007499857552
## education Some-college
## 0.005859703301
## education Doctorate
## 0.004205018821
## race Other
## 0.004155373216
## education Preschool
## 0.004050356428
## native_country El-Salvador
## 0.003966561420
## education Prof-school
## 0.003746447659
## occupation Exec-managerial
## 0.003118614137
## education Assoc-acdm
## 0.002931217726
## relationship Other-relative
## 0.002929457015
## occupation Tech-support
## 0.002808244597
## native_country Portugal
## 0.002781979313
## native_country Puerto-Rico
## 0.002680038828
## native_country Guatemala
## 0.002610164436
## native_country Dominican-Republic
## 0.002444602570
## native_country Italy
## 0.002423624820
## occupation Handlers-cleaners
## 0.002120961904
## occupation Sales
## 0.002077514894
## workclass Private
## 0.001657317523
## native_country Cuba
## 0.001407572798
## marital_status Married-spouse-absent
## 0.001346070940
## occupation Priv-house-serv
## 0.001223218088
## race Black
## 0.001193052087
## race Asian-Pac-Islander
## 0.000905655231
## native_country Columbia
## 0.000877723172
## native_country Philippines
## 0.000866153938
## education 11th
## 0.000825956590
## native_country Ecuador
## 0.000745687004
## native_country Haiti
## 0.000718631827
## native_country Greece
## 0.000705537699
## native_country Poland
## 0.000667866961
## relationship Unmarried
## 0.000667184016
## native_country Vietnam
## 0.000621013539
## capital_gain
## 0.000507609683
## native_country Laos
## 0.000479030184
## native_country Nicaragua
## 0.000451781074
## native_country China
## 0.000425993011
## native_country Canada
## 0.000392148130
## native_country South
## 0.000385627861
## native_country Yugoslavia
## 0.000342690549
## education Assoc-voc
## 0.000225841682
## native_country Trinadad&Tobago
## 0.000224789378
## occupation Other-service
## 0.000219390054
## education 12th
## 0.000189343134
## native_country Peru
## 0.000171580535
## native_country Honduras
## 0.000170691426
## workclass Self-emp-inc
## 0.000156700765
## native_country Ireland
## 0.000144313907
## native_country Hong
## 0.000136951233
## native_country Germany
## 0.000124694610
## workclass Without-pay
## 0.000121017646
## marital_status Widowed
## 0.000119824884
## native_country Jamaica
## 0.000117147656
## native_country Japan
## 0.000106859685
## relationship Wife
## 0.000105830302
## native_country Thailand
## 0.000099156329
## occupation Protective-serv
## 0.000085429108
## native_country Scotland
## 0.000077207882
## native_country Taiwan
## 0.000060434185
## native_country England
## 0.000055459906
## capital_loss
## 0.000041399485
## native_country Hungary
## 0.000038010100
## native_country Iran
## 0.000035831003
## native_country India
## 0.000032878240
## marital_status Married-AF-spouse
## 0.000027009349
## occupation Armed-Forces
## 0.000023954116
## native_country Outlying-US(Guam-USVI-etc)
## 0.000009759741
## race White
## 0.000004679474
## marital_status Separated
## 0.000002579003
## native_country France
## 0.000001181557
#plot observations in the space of the first few principal components with gender
plot(pca.out$x[,1:2],col=c("red","blue")[as.numeric(factor(merged.data$sex))],pch=as.numeric(factor(merged.data$sex)))
legend("topleft",c("male","female"),pch=1:2,col=c("red","blue"),text.col=c("red","blue"))
#plot observations in the space of the first few principal components with salary
plot(pca.out$x[,1:2],col=c("red","blue")[as.numeric(factor(merged.data$salary))],pch=as.numeric(factor(merged.data$salary)))
legend("topleft",c(">50","<=50"),pch=1:2,col=c("red","blue"),text.col=c("red","blue"))
We have used categorical attributes such as education,sex,marital_status etc. by converting them to a dummy variable which is a commonly used method for converting a categorical input variable into a continuous variable.
From the above informtion we can say that in case of PC1 the following are given more weightage . Only top 5 attributes are being taken. ## marital_status Married-civ-spouse ## 0.1240342595329029 ## marital_status Never-married ## 0.0963338001137464 ## education_num ## 0.0868846794820760 ## age ## 0.0648261074083941 ## relationship Own-child ## 0.0613575294742202
From the above informtion we can say that in case of PC2 the following are given more weightage.Only top 5 attributes are being taken. ## education_num ## 0.158821224938 ## marital_status Married-civ-spouse ## 0.084666722129 ## marital_status Never-married ## 0.074427479156 ## sex Male ## 0.056009351029 ## native_country United-States ## 0.051630899464
This tells that significant attributes which could effect salary are : arital_status Married-civ-spouse, marital_status Never-married and education_num. ***
Develop logistic regression model of the outcome as a function of multiple predictors in the model. Which variables are significantly associated with the outcome? Test model performance on multiple splits of data into training and test subsets, summarize it in terms of accuracy/error, sensitivity/specificity and compare to the performance of other methods reported in the dataset description.
# logistic regression on whole data
glm.fit=glm(salary~.,data=noNAData,control=glm.control(epsilon = 1e-8, maxit = 50, trace = FALSE),family=binomial)
summary(glm.fit)
##
## Call:
## glm(formula = salary ~ ., family = binomial, data = noNAData,
## control = glm.control(epsilon = 0.00000001, maxit = 50, trace = FALSE))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -5.0551 -0.5143 -0.1916 -0.0208 3.8486
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error z value
## (Intercept) -4.114910 0.640615 -6.423
## age 0.327399 0.018411 17.783
## workclass Local-gov -0.638646 0.092487 -6.905
## workclass Private -0.486967 0.077123 -6.314
## workclass Self-emp-inc -0.349237 0.101236 -3.450
## workclass Self-emp-not-inc -1.033051 0.090373 -11.431
## workclass State-gov -0.794305 0.101996 -7.788
## workclass Without-pay -1.376380 0.787450 -1.748
## education 11th 0.096182 0.177925 0.541
## education 12th 0.473130 0.225373 2.099
## education 1st-4th -0.488936 0.421520 -1.160
## education 5th-6th -0.260869 0.279729 -0.933
## education 7th-8th -0.467023 0.198927 -2.348
## education 9th -0.258011 0.222550 -1.159
## education Assoc-acdm 1.406545 0.149984 9.378
## education Assoc-voc 1.312350 0.144665 9.072
## education Bachelors 1.966305 0.134958 14.570
## education Doctorate 2.874407 0.184094 15.614
## education HS-grad 0.843652 0.131413 6.420
## education Masters 2.296984 0.143318 16.027
## education Preschool -4.963364 3.516624 -1.411
## education Prof-school 2.905179 0.173474 16.747
## education Some-college 1.202055 0.133315 9.017
## education_num NA NA NA
## marital_status Married-AF-spouse 2.614753 0.484757 5.394
## marital_status Married-civ-spouse 2.296402 0.225347 10.191
## marital_status Married-spouse-absent 0.187137 0.187204 1.000
## marital_status Never-married -0.422114 0.073214 -5.766
## marital_status Separated -0.007293 0.134736 -0.054
## marital_status Widowed 0.121776 0.129867 0.938
## occupation Armed-Forces 0.200785 0.891530 0.225
## occupation Craft-repair 0.060327 0.065591 0.920
## occupation Exec-managerial 0.780109 0.063219 12.340
## occupation Farming-fishing -0.989079 0.115658 -8.552
## occupation Handlers-cleaners -0.684971 0.115127 -5.950
## occupation Machine-op-inspct -0.287707 0.083668 -3.439
## occupation Other-service -0.882103 0.097758 -9.023
## occupation Priv-house-serv -1.988047 0.754023 -2.637
## occupation Prof-specialty 0.522651 0.066758 7.829
## occupation Protective-serv 0.498353 0.103267 4.826
## occupation Sales 0.262431 0.067504 3.888
## occupation Tech-support 0.560657 0.090539 6.192
## occupation Transport-moving -0.089697 0.081313 -1.103
## relationship Not-in-family 0.513363 0.222933 2.303
## relationship Other-relative -0.484330 0.202716 -2.389
## relationship Own-child -0.580790 0.219134 -2.650
## relationship Unmarried 0.336870 0.236386 1.425
## relationship Wife 1.128630 0.085572 13.189
## race Asian-Pac-Islander 0.913724 0.228724 3.995
## race Black 0.374231 0.190972 1.960
## race Other 0.510594 0.281169 1.816
## race White 0.564560 0.181550 3.110
## sex Male 0.717838 0.065202 11.009
## capital_gain 2.391640 0.065547 36.488
## capital_loss 0.261820 0.012639 20.716
## hours_per_week 0.348198 0.016504 21.098
## native_country Canada -0.164305 0.583254 -0.282
## native_country China -1.512931 0.594731 -2.544
## native_country Columbia -2.885063 0.832196 -3.467
## native_country Cuba -0.519798 0.601975 -0.863
## native_country Dominican-Republic -1.711068 0.773840 -2.211
## native_country Ecuador -1.121192 0.783790 -1.430
## native_country El-Salvador -1.245184 0.683252 -1.822
## native_country England -0.280294 0.604734 -0.464
## native_country France -0.008739 0.698144 -0.013
## native_country Germany -0.602822 0.583203 -1.034
## native_country Greece -0.918270 0.657657 -1.396
## native_country Guatemala -1.236473 0.899057 -1.375
## native_country Haiti -0.438156 0.720537 -0.608
## native_country Honduras -0.657362 1.250473 -0.526
## native_country Hong -1.278501 0.788960 -1.620
## native_country Hungary -0.327678 0.794134 -0.413
## native_country India -1.150063 0.574227 -2.003
## native_country Iran -0.768290 0.653444 -1.176
## native_country Ireland 0.098273 0.727965 0.135
## native_country Italy -0.120160 0.607382 -0.198
## native_country Jamaica -0.533367 0.659353 -0.809
## native_country Japan -1.033857 0.616636 -1.677
## native_country Laos -2.210085 0.997514 -2.216
## native_country Mexico -1.379802 0.571595 -2.414
## native_country Nicaragua -1.168116 0.841921 -1.387
## native_country Outlying-US(Guam-USVI-etc) -1.571173 1.198919 -1.310
## native_country Peru -1.577161 0.815533 -1.934
## native_country Philippines -0.648910 0.556503 -1.166
## native_country Poland -0.712531 0.631853 -1.128
## native_country Portugal -0.009100 0.660699 -0.014
## native_country Puerto-Rico -0.907674 0.617097 -1.471
## native_country Scotland -2.060590 0.980842 -2.101
## native_country South -2.194233 0.637861 -3.440
## native_country Taiwan -1.106390 0.655435 -1.688
## native_country Thailand -1.802995 0.840100 -2.146
## native_country Trinadad&Tobago -2.017349 0.981409 -2.056
## native_country United-States -0.583529 0.542016 -1.077
## native_country Vietnam -2.040055 0.714166 -2.857
## native_country Yugoslavia -0.017572 0.781595 -0.022
## Pr(>|z|)
## (Intercept) 0.00000000013328860 ***
## age < 0.0000000000000002 ***
## workclass Local-gov 0.00000000000501201 ***
## workclass Private 0.00000000027161593 ***
## workclass Self-emp-inc 0.000561 ***
## workclass Self-emp-not-inc < 0.0000000000000002 ***
## workclass State-gov 0.00000000000000683 ***
## workclass Without-pay 0.080482 .
## education 11th 0.588802
## education 12th 0.035788 *
## education 1st-4th 0.246075
## education 5th-6th 0.351039
## education 7th-8th 0.018889 *
## education 9th 0.246318
## education Assoc-acdm < 0.0000000000000002 ***
## education Assoc-voc < 0.0000000000000002 ***
## education Bachelors < 0.0000000000000002 ***
## education Doctorate < 0.0000000000000002 ***
## education HS-grad 0.00000000013641859 ***
## education Masters < 0.0000000000000002 ***
## education Preschool 0.158127
## education Prof-school < 0.0000000000000002 ***
## education Some-college < 0.0000000000000002 ***
## education_num NA
## marital_status Married-AF-spouse 0.00000006892560322 ***
## marital_status Married-civ-spouse < 0.0000000000000002 ***
## marital_status Married-spouse-absent 0.317483
## marital_status Never-married 0.00000000814056487 ***
## marital_status Separated 0.956832
## marital_status Widowed 0.348399
## occupation Armed-Forces 0.821813
## occupation Craft-repair 0.357703
## occupation Exec-managerial < 0.0000000000000002 ***
## occupation Farming-fishing < 0.0000000000000002 ***
## occupation Handlers-cleaners 0.00000000268633981 ***
## occupation Machine-op-inspct 0.000585 ***
## occupation Other-service < 0.0000000000000002 ***
## occupation Priv-house-serv 0.008374 **
## occupation Prof-specialty 0.00000000000000492 ***
## occupation Protective-serv 0.00000139378916839 ***
## occupation Sales 0.000101 ***
## occupation Tech-support 0.00000000059243747 ***
## occupation Transport-moving 0.269981
## relationship Not-in-family 0.021292 *
## relationship Other-relative 0.016885 *
## relationship Own-child 0.008040 **
## relationship Unmarried 0.154132
## relationship Wife < 0.0000000000000002 ***
## race Asian-Pac-Islander 0.00006472640821200 ***
## race Black 0.050041 .
## race Other 0.069375 .
## race White 0.001873 **
## sex Male < 0.0000000000000002 ***
## capital_gain < 0.0000000000000002 ***
## capital_loss < 0.0000000000000002 ***
## hours_per_week < 0.0000000000000002 ***
## native_country Canada 0.778170
## native_country China 0.010962 *
## native_country Columbia 0.000527 ***
## native_country Cuba 0.387870
## native_country Dominican-Republic 0.027026 *
## native_country Ecuador 0.152581
## native_country El-Salvador 0.068389 .
## native_country England 0.643006
## native_country France 0.990013
## native_country Germany 0.301305
## native_country Greece 0.162632
## native_country Guatemala 0.169038
## native_country Haiti 0.543123
## native_country Honduras 0.599103
## native_country Hong 0.105127
## native_country Hungary 0.679883
## native_country India 0.045199 *
## native_country Iran 0.239693
## native_country Ireland 0.892614
## native_country Italy 0.843176
## native_country Jamaica 0.418558
## native_country Japan 0.093619 .
## native_country Laos 0.026719 *
## native_country Mexico 0.015781 *
## native_country Nicaragua 0.165307
## native_country Outlying-US(Guam-USVI-etc) 0.190030
## native_country Peru 0.053125 .
## native_country Philippines 0.243594
## native_country Poland 0.259453
## native_country Portugal 0.989011
## native_country Puerto-Rico 0.141325
## native_country Scotland 0.035655 *
## native_country South 0.000582 ***
## native_country Taiwan 0.091407 .
## native_country Thailand 0.031860 *
## native_country Trinadad&Tobago 0.039824 *
## native_country United-States 0.281664
## native_country Vietnam 0.004283 **
## native_country Yugoslavia 0.982064
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 50644 on 45220 degrees of freedom
## Residual deviance: 29306 on 45127 degrees of freedom
## AIC: 29494
##
## Number of Fisher Scoring iterations: 8
# Calculating predictions
Z=predict(glm.fit,type="response")
#assuming we are predicting "1" as <=50K and "0" as >50K.
Z=ifelse(Z >.5,"1","2")
# drawing contingency tabe with the prediction vs real values
tbl<-table(Z,glm.fit$model$salary)
tbl
##
## Z <=50K >50K
## 1 2435 6787
## 2 31578 4421
Based on the regression summary above Significant variables associated with the outcome are :
*age - positively associated
*workclass Self-emp-not-inc - negatively associated
*education Bachelors - positively associated
*education Doctorate - positively associated
*education Masters - positively associated
*education Prof-school - positively associated
*occupation Exec-managerial - positively associated
*occupation Tech-support - positively associated
*relationship Wife - positively associated
*sex Male - positively associated
*capital Gain - positively associated
*capital Loss - positively associated
# recode level with for salary column
levels(noNAData$salary)
## [1] " <=50K" " >50K"
adult.cmplt<- noNAData
errorLM<-numeric(100)
sensitivityLM<-numeric(100)
specificityLM<-numeric(100)
for ( iTry in 1:100 ) {
# Building the prediction model
ratio = sample(1:nrow(adult.cmplt), size = 0.25*nrow(adult.cmplt))
test.data = adult.cmplt[ratio,] #Test dataset 25% of total
train.data = adult.cmplt[-ratio,] #Train dataset 75% of total
dim(train.data)
dim(test.data)
str(train.data)
# Logistic Regression Model
glm.fit<- glm(salary~., family=binomial(link='logit'),data = train.data)
glm.fit$xlevels[["native_country"]]<-union(glm.fit$xlevels[["native_country"]],levels(test.data$native_country))
#summary(glm.fit)
glm.pred<- predict(glm.fit, test.data, type = "response")
#hist(glm.pred, breaks=20)
#hist(glm.pred[test.data$salary], col="red", breaks=20, add=TRUE)
# check classification performance
tabl<-table(actual= test.data$salary, predicted= glm.pred>0.5)
dimnames(tabl)[[2]] = c(" <=50K"," >50K")
cm<-confusionMatrix(tabl)
sensitivityLM[iTry]<-cm$byClass['Sensitivity']
specificityLM[iTry]<-cm$byClass['Specificity']
overall <- cm$overall
overall.accuracy <- overall['Accuracy']
errorLM[iTry]<-1-overall.accuracy
}
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 5 3 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 7 12 13 10 16 ...
## $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 4 3 5 3 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 8 4 10 4 4 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 2 1 2 1 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 3 5 5 5 3 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 2 1 2 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 22 38 38 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 2 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 5 3 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 13 12 13 10 16 ...
## $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 3 3 5 3 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 4 4 10 4 4 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 6 1 2 1 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 5 5 5 3 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 2 1 2 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 38 38 38 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 2 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 3 3 3 6 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 13 7 13 16 10 ...
## $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 3 4 5 3 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 4 8 10 4 10 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 6 2 2 1 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 3 5 3 2 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 1 1 2 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 38 22 38 38 18 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 1 2 2 2 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.8664 -0.0415 1.0934 -0.798 -0.1171 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 5 3 3 3 3 3 3 3 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 10 13 7 13 16 10 8 ...
## $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.129 1.52 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 3 1 3 3 3 4 5 3 5 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 4 6 6 10 4 8 10 4 1 12 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 1 2 1 6 6 2 2 1 4 2 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 3 5 3 5 3 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 1 2 1 2 ...
## $ capital_gain : num [1:33916, 1] -0.147 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -2.3267 -0.0781 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 22 38 38 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 1 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.8664 -0.0415 1.0934 -0.798 -0.1171 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 5 3 3 3 3 3 3 3 3 6 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 10 13 7 13 10 16 10 ...
## $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.129 1.52 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 3 1 3 3 3 4 5 3 3 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 4 6 6 10 4 8 10 4 4 10 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 1 2 1 6 6 2 2 1 1 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 3 5 5 3 2 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 1 2 2 2 ...
## $ capital_gain : num [1:33916, 1] -0.147 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -2.3267 -0.0781 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 22 38 38 38 18 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 2 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 3 5 3 6 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 13 7 12 16 10 ...
## $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 3 4 3 3 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 4 8 4 4 10 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 6 2 1 1 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 3 5 3 2 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 1 2 2 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 38 22 38 38 18 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 1 2 2 2 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 -0.0415 -0.798 -0.1171 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 5 3 3 6 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 10 13 7 12 13 16 10 ...
## $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 1.129 1.52 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 4 3 5 3 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 10 4 8 4 10 4 10 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 6 6 2 1 2 1 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 5 3 5 5 3 2 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 2 1 2 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 22 38 38 38 18 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 2 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 5 3 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 7 12 13 16 10 ...
## $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 4 3 5 3 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 8 4 10 4 1 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 2 1 2 1 4 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 3 5 5 3 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 2 1 2 1 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 22 38 38 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] -0.0415 -0.798 -0.1171 0.7907 -0.571 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 3 3 3 3 3 3 3 3 3 5 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 12 10 13 7 13 10 10 8 6 12 ...
## $ education_num : num [1:33916, 1] -0.438 1.129 1.52 -2.005 1.52 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 1 3 3 4 5 3 5 5 3 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 6 10 4 8 10 4 1 12 14 5 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 6 6 2 2 1 4 2 1 4 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 3 5 3 5 5 5 3 1 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 1 1 1 1 2 1 2 2 2 ...
## $ capital_gain : num [1:33916, 1] -0.147 -0.147 -0.147 -0.147 1.73 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -2.0768 0.7547 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 5 38 22 38 38 38 38 25 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 2 2 1 1 1 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 1.0934 -0.798 -0.1171 1.0177 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 5 3 3 3 6 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 2 10 13 12 13 10 16 10 8 ...
## $ education_num : num [1:33916, 1] 1.129 -1.222 1.129 1.52 -0.438 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 3 3 3 5 3 3 3 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 6 10 4 4 10 4 4 10 12 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 6 6 1 2 1 1 1 2 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 3 3 5 5 5 5 3 2 3 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 1 1 2 1 2 2 2 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 0.3383 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 5 38 38 38 38 38 18 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 2 2 2 2 2 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.866 1.093 -0.798 -0.117 0.791 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 5 3 3 3 3 5 3 3 3 6 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 2 10 13 7 12 13 10 16 10 ...
## $ education_num : num [1:33916, 1] 1.13 -1.22 1.13 1.52 -2 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 3 3 3 3 4 3 5 3 3 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 4 6 10 4 8 4 10 4 4 10 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 1 1 6 6 2 1 2 1 1 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 3 3 5 3 5 5 5 3 2 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 1 1 1 2 1 2 2 2 ...
## $ capital_gain : num [1:33916, 1] -0.147 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -2.3267 -0.0781 -0.0781 -0.0781 -2.0768 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 5 38 22 38 38 38 38 18 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 2 2 2 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 0.7907 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 3 6 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 7 10 16 10 10 8 ...
## $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 -2.005 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 4 3 3 3 5 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 8 4 4 10 1 12 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 2 1 1 1 4 2 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 3 2 5 3 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 2 2 2 1 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -2.0768 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 22 38 38 18 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 2 1 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 -0.0415 -0.1171 1.0177 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 5 3 3 6 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 13 12 10 16 10 8 6 ...
## $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 1.52 -0.438 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 3 3 3 5 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 4 4 4 4 10 12 14 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 6 1 1 1 1 2 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 5 5 5 3 2 3 1 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 1 2 2 2 2 2 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 0.3383 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 38 38 38 18 38 25 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 2 2 2 2 1 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 3 5 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 13 7 12 13 16 ...
## $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 3 4 3 5 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 4 8 4 10 4 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 6 2 1 2 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 3 5 5 3 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 1 2 1 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 38 22 38 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 1 2 2 2 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 3 5 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 13 7 12 13 10 ...
## $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 3 4 3 5 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 4 8 4 10 4 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 6 2 1 2 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 3 5 5 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 1 2 1 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 38 22 38 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 1 2 2 2 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 -0.0415 1.0934 0.7907 1.0177 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 5 3 3 3 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 7 12 13 10 16 10 8 ...
## $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 -2.005 -0.438 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 1 3 4 3 5 3 3 5 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 6 6 8 4 10 4 4 1 12 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 2 1 2 1 2 1 1 4 2 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 5 5 3 5 3 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 1 2 1 2 2 1 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -2.0768 0.3383 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 22 38 38 38 38 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 2 2 2 2 1 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 5 3 3 3 5 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 12 13 16 10 12 ...
## $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 3 5 3 5 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 4 10 4 1 5 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 1 2 1 4 4 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 5 3 5 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 2 1 2 1 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 38 38 38 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 2 1 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.8664 -0.0415 1.0934 -0.798 -0.1171 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 5 3 3 3 3 3 3 3 6 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 10 13 7 10 16 10 10 ...
## $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.129 1.52 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 3 1 3 3 3 4 3 3 3 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 4 6 6 10 4 8 4 4 10 1 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 1 2 1 6 6 2 1 1 1 4 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 3 5 3 2 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 2 2 2 1 ...
## $ capital_gain : num [1:33916, 1] -0.147 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -2.3267 -0.0781 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 22 38 38 18 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 0.7907 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 3 6 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 7 13 10 10 10 8 ...
## $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 -2.005 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 4 5 3 3 5 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 8 10 4 10 1 12 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 2 2 1 1 4 2 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 5 2 5 3 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 2 2 1 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -2.0768 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 22 38 38 18 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 2 1 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 1.0934 -0.798 -0.1171 -0.571 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 3 3 6 3 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 2 10 13 13 10 10 10 8 6 ...
## $ education_num : num [1:33916, 1] 1.13 -1.22 1.13 1.52 1.52 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 3 3 5 3 3 5 5 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 6 10 4 10 4 10 1 12 14 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 6 6 2 1 1 4 2 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 3 3 5 5 5 2 5 3 1 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 1 1 1 2 2 1 2 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 1.73 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 0.7547 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 5 38 38 38 18 38 38 25 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 2 2 2 1 1 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] -0.798 -0.117 0.791 1.018 0.261 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 3 3 3 5 3 3 3 5 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 13 7 12 10 16 10 12 12 11 ...
## $ education_num : num [1:33916, 1] 1.129 1.52 -2.005 -0.438 1.129 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 3 3 4 3 3 3 5 5 5 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 10 4 8 4 4 4 1 5 7 10 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 6 6 2 1 1 1 4 4 5 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 3 5 3 5 5 3 5 5 5 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 1 1 1 2 2 2 1 2 2 2 ...
## $ capital_gain : num [1:33916, 1] -0.147 -0.147 -0.147 -0.147 0.543 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -2.0768 0.3383 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 5 38 22 38 38 38 38 38 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 2 2 2 1 1 1 2 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 0.7907 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 5 3 3 6 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 7 12 10 16 10 10 ...
## $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 -2.005 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 4 3 3 3 3 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 8 4 4 4 10 1 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 2 1 1 1 1 4 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 5 3 2 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 2 2 2 2 1 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -2.0768 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 22 38 38 38 18 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 2 2 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.8664 -0.0415 1.0934 -0.798 -0.1171 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 5 3 3 3 3 3 5 3 3 6 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 10 13 7 12 13 16 10 ...
## $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.129 1.52 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 3 1 3 3 3 4 3 5 3 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 4 6 6 10 4 8 4 10 4 10 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 1 2 1 6 6 2 1 2 1 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 3 5 5 3 2 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 2 1 2 2 ...
## $ capital_gain : num [1:33916, 1] -0.147 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -2.3267 -0.0781 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 22 38 38 38 18 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 2 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] -0.0415 1.0934 -0.1171 0.7907 1.0177 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 3 3 3 3 5 3 3 6 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 12 2 13 7 12 10 16 10 10 8 ...
## $ education_num : num [1:33916, 1] -0.438 -1.222 1.52 -2.005 -0.438 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 1 3 3 4 3 3 3 3 5 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 6 6 4 8 4 4 4 10 1 12 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 6 2 1 1 1 1 4 2 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 3 5 3 5 5 3 2 5 3 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 1 1 2 2 2 2 1 2 ...
## $ capital_gain : num [1:33916, 1] -0.147 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -2.0768 0.3383 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 22 38 38 38 18 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 2 2 2 2 1 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 5 3 3 6 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 7 12 13 10 10 ...
## $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 4 3 5 3 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 8 4 10 4 10 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 2 1 2 1 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 3 5 5 5 2 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 2 1 2 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 22 38 38 38 18 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 2 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 3 5 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 13 7 12 13 10 ...
## $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 3 4 3 5 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 4 8 4 10 4 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 6 2 1 2 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 3 5 5 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 1 2 1 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 38 22 38 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 1 2 2 2 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 1.0934 -0.798 -0.1171 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 5 3 3 6 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 2 10 13 7 12 10 16 10 ...
## $ education_num : num [1:33916, 1] 1.13 1.13 -1.22 1.13 1.52 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 3 3 3 4 3 3 3 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 10 4 8 4 4 4 10 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 1 6 6 2 1 1 1 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 3 5 5 3 2 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 2 2 2 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 22 38 38 38 18 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 2 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 -0.0415 1.0934 -0.798 -0.1171 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 3 3 5 3 3 6 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 10 13 7 12 13 10 10 ...
## $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.129 1.52 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 1 3 3 3 4 3 5 3 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 6 6 10 4 8 4 10 4 10 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 2 1 6 6 2 1 2 1 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 3 5 5 5 2 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 2 1 2 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 22 38 38 38 18 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 2 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 5 3 6 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 13 12 16 10 10 ...
## $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 3 3 3 3 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 4 4 4 10 1 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 6 1 1 1 4 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 5 3 2 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 2 2 2 1 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 38 38 38 18 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.866 -0.798 0.791 -0.571 0.261 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 5 3 3 3 3 3 6 3 3 5 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 7 13 10 16 10 8 6 12 ...
## $ education_num : num [1:33916, 1] 1.13 1.13 -2 1.52 1.13 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 3 3 4 5 3 3 3 5 3 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 4 10 8 10 4 4 10 12 14 5 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 1 6 2 2 1 1 1 2 1 4 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 3 3 5 5 3 2 3 1 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 1 1 1 2 2 2 2 2 2 ...
## $ capital_gain : num [1:33916, 1] -0.147 -0.147 -0.147 1.73 0.543 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -2.3267 -0.0781 -2.0768 0.7547 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 5 22 38 38 38 18 38 25 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 2 2 2 2 1 1 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.1171 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 5 6 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 13 7 12 10 10 8 ...
## $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.52 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 4 3 3 5 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 4 8 4 10 1 12 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 2 1 1 4 2 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 5 3 5 2 5 3 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 2 2 1 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 38 22 38 18 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 1 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 -0.0415 1.0934 -0.798 0.7907 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 3 5 3 3 6 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 10 7 12 13 16 10 10 ...
## $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.129 -2.005 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 1 3 3 4 3 5 3 3 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 6 6 10 8 4 10 4 10 1 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 2 1 6 2 1 2 1 1 4 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 3 5 5 3 2 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 2 1 2 2 1 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 -2.0768 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 22 38 38 38 18 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 2 2 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 5 3 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 13 12 13 16 6 ...
## $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 3 3 5 3 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 4 4 10 4 14 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 6 1 2 1 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 5 5 3 1 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 2 1 2 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 38 38 38 38 25 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.8664 -0.0415 -0.571 -0.1171 -0.6467 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 5 3 3 3 6 3 3 3 3 5 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 12 13 16 10 10 6 12 2 13 ...
## $ education_num : num [1:33916, 1] 1.1287 -0.4381 1.5204 -0.0464 1.1287 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 3 1 5 3 3 5 3 5 3 1 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 4 6 10 4 10 1 14 7 12 4 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 1 2 2 1 1 4 1 5 1 5 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 2 5 1 5 5 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 1 2 2 1 2 2 2 1 ...
## $ capital_gain : num [1:33916, 1] -0.147 -0.147 1.73 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -2.3267 -0.0781 0.7547 3.2531 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 18 38 25 38 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 2 2 2 1 1 1 1 2 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.1171 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 5 3 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 13 7 12 13 10 16 ...
## $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.52 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 4 3 5 3 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 4 8 4 10 4 4 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 2 1 2 1 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 5 3 5 5 5 3 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 2 1 2 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 38 22 38 38 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 2 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.866 1.093 -0.117 0.791 0.261 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 5 3 3 3 3 3 6 3 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 2 13 7 10 16 10 10 8 6 ...
## $ education_num : num [1:33916, 1] 1.13 -1.22 1.52 -2 1.13 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 3 3 3 4 3 3 3 5 5 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 4 6 4 8 4 4 10 1 12 14 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 1 1 6 2 1 1 1 4 2 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 3 5 3 5 3 2 5 3 1 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 1 1 2 2 2 1 2 2 ...
## $ capital_gain : num [1:33916, 1] -0.147 -0.147 -0.147 -0.147 0.543 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -2.3267 -0.0781 -0.0781 -2.0768 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 22 38 38 18 38 38 25 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 2 2 2 1 1 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.866 1.093 0.791 1.018 -0.571 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 5 3 3 5 3 3 3 6 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 2 7 12 13 10 16 10 8 6 ...
## $ education_num : num [1:33916, 1] 1.129 -1.222 -2.005 -0.438 1.52 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 3 3 4 3 5 3 3 3 5 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 4 6 8 4 10 4 4 10 12 14 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 1 1 2 1 2 1 1 1 2 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 3 3 5 5 5 3 2 3 1 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 1 2 1 2 2 2 2 2 ...
## $ capital_gain : num [1:33916, 1] -0.147 -0.147 -0.147 -0.147 1.73 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -2.3267 -0.0781 -2.0768 0.3383 0.7547 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 22 38 38 38 38 18 38 25 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 2 2 2 2 2 1 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 1.0934 -0.798 -0.1171 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 5 3 6 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 2 10 13 7 12 13 10 8 ...
## $ education_num : num [1:33916, 1] 1.13 1.13 -1.22 1.13 1.52 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 3 3 3 4 3 5 3 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 10 4 8 4 10 10 12 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 1 6 6 2 1 2 1 2 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 3 5 5 2 3 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 2 1 2 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 22 38 38 18 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 -0.0415 1.0934 -0.798 -0.1171 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 3 3 5 3 6 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 10 13 7 12 10 10 10 ...
## $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.129 1.52 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 1 3 3 3 4 3 3 3 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 6 6 10 4 8 4 4 10 1 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 2 1 6 6 2 1 1 1 4 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 3 5 5 2 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 2 2 2 1 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 22 38 38 18 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 1.0934 -0.798 -0.1171 1.0177 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 5 3 3 3 6 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 2 10 13 12 13 10 16 10 10 ...
## $ education_num : num [1:33916, 1] 1.129 -1.222 1.129 1.52 -0.438 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 3 3 3 5 3 3 3 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 6 10 4 4 10 4 4 10 1 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 6 6 1 2 1 1 1 4 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 3 3 5 5 5 5 3 2 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 1 1 2 1 2 2 2 1 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 0.3383 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 5 38 38 38 38 38 18 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 2 2 2 2 2 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.1171 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 3 3 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 13 7 10 16 8 12 ...
## $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.52 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 4 3 3 5 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 4 8 4 4 12 7 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 2 1 1 2 5 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 5 3 5 3 3 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 2 2 2 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 38 22 38 38 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 1 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.8664 -0.0415 1.0934 -0.1171 0.7907 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 5 3 3 3 3 5 3 6 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 13 7 12 13 10 10 2 ...
## $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.52 -2.005 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 3 1 3 3 4 3 5 3 5 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 4 6 6 4 8 4 10 10 1 12 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 1 2 1 6 2 1 2 1 4 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 5 3 5 5 2 5 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 2 1 2 1 2 ...
## $ capital_gain : num [1:33916, 1] -0.147 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -2.3267 -0.0781 -0.0781 -0.0781 -2.0768 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 22 38 38 18 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 2 1 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 3 5 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 13 7 12 10 16 ...
## $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 3 4 3 3 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 4 8 4 4 4 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 6 2 1 1 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 3 5 5 3 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 1 2 2 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 38 22 38 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 1 2 2 2 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 -0.0415 1.0934 -0.798 1.0177 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 5 3 3 3 6 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 10 12 13 10 16 10 10 ...
## $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.129 -0.438 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 1 3 3 3 5 3 3 3 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 6 6 10 4 10 4 4 10 1 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 2 1 6 1 2 1 1 1 4 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 5 5 3 2 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 1 2 1 2 2 2 1 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 0.3383 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 38 38 38 18 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 2 2 2 2 2 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 -0.0415 1.0934 -0.798 -0.1171 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 3 3 3 3 6 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 10 13 7 13 10 10 10 ...
## $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.129 1.52 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 1 3 3 3 4 5 3 3 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 6 6 10 4 8 10 4 10 1 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 2 1 6 6 2 2 1 1 4 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 3 5 5 2 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 1 2 2 1 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 22 38 38 18 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 1.0934 -0.798 0.7907 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 5 3 3 6 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 2 10 7 12 13 10 10 10 ...
## $ education_num : num [1:33916, 1] 1.13 1.13 -1.22 1.13 -2 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 3 3 4 3 5 3 3 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 10 8 4 10 4 10 1 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 1 6 2 1 2 1 1 4 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 3 5 5 5 2 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 2 1 2 2 1 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -2.0768 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 22 38 38 38 18 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 2 2 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 -0.0415 0.7907 1.0177 -0.571 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 3 3 5 3 3 3 6 3 5 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 12 7 12 13 10 16 10 10 12 ...
## $ education_num : num [1:33916, 1] 1.129 -0.438 -2.005 -0.438 1.52 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 1 4 3 5 3 3 3 5 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 6 8 4 10 4 4 10 1 5 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 2 2 1 2 1 1 1 4 4 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 5 5 5 3 2 5 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 1 2 1 2 2 2 1 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 1.73 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -2.0768 0.3383 0.7547 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 22 38 38 38 38 18 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 2 2 2 2 2 1 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 3 3 3 6 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 13 7 13 16 10 ...
## $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 3 4 5 3 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 4 8 10 4 10 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 6 2 2 1 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 3 5 3 2 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 1 1 2 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 38 22 38 38 18 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 1 2 2 2 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 -0.1171 0.7907 1.0177 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 5 3 3 6 3 5 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 13 7 12 10 16 10 10 12 ...
## $ education_num : num [1:33916, 1] 1.129 1.129 1.52 -2.005 -0.438 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 3 4 3 3 3 3 5 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 4 8 4 4 4 10 1 5 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 6 2 1 1 1 1 4 4 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 5 5 3 2 5 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 1 1 2 2 2 2 1 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -2.0768 0.3383 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 22 38 38 38 18 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 2 2 2 2 1 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 3 5 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 13 7 12 13 10 ...
## $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 3 4 3 5 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 4 8 4 10 4 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 6 2 1 2 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 3 5 5 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 1 2 1 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 38 22 38 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 1 2 2 2 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.8664 -0.0415 1.0934 -0.1171 0.7907 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 5 3 3 3 3 5 3 3 6 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 13 7 12 13 16 10 8 ...
## $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.52 -2.005 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 3 1 3 3 4 3 5 3 3 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 4 6 6 4 8 4 10 4 10 12 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 1 2 1 6 2 1 2 1 1 2 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 5 3 5 5 3 2 3 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 2 1 2 2 2 ...
## $ capital_gain : num [1:33916, 1] -0.147 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -2.3267 -0.0781 -0.0781 -0.0781 -2.0768 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 22 38 38 38 18 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 2 2 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 -0.0415 -0.798 -0.1171 0.7907 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 3 3 3 3 6 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 12 10 13 7 13 10 16 10 10 ...
## $ education_num : num [1:33916, 1] 1.129 -0.438 1.129 1.52 -2.005 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 1 3 3 4 5 3 3 3 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 6 10 4 8 10 4 4 10 1 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 2 6 6 2 2 1 1 1 4 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 5 3 5 5 3 2 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 1 1 1 1 2 2 2 1 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 -2.0768 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 5 38 22 38 38 38 18 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 2 2 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.8664 -0.0415 -0.798 -0.1171 0.7907 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 5 3 3 3 3 3 6 3 3 5 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 12 10 13 7 10 10 10 6 12 ...
## $ education_num : num [1:33916, 1] 1.129 -0.438 1.129 1.52 -2.005 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 3 1 3 3 4 3 3 5 3 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 4 6 10 4 8 4 10 1 14 5 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 1 2 6 6 2 1 1 4 1 4 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 5 3 5 2 5 1 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 1 1 1 2 2 1 2 2 ...
## $ capital_gain : num [1:33916, 1] -0.147 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -2.3267 -0.0781 -0.0781 -0.0781 -2.0768 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 5 38 22 38 18 38 25 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 1 1 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 1.0934 -0.798 -0.1171 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 3 3 6 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 2 10 13 7 10 16 10 10 ...
## $ education_num : num [1:33916, 1] 1.13 1.13 -1.22 1.13 1.52 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 3 3 3 4 3 3 3 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 10 4 8 4 4 10 1 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 1 6 6 2 1 1 1 4 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 3 5 3 2 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 2 2 2 1 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 22 38 38 18 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 -0.0415 1.0934 1.0177 -0.571 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 3 3 5 3 6 3 3 5 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 12 13 10 10 6 12 12 ...
## $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 -0.438 1.52 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 1 3 3 5 3 5 3 5 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 6 6 4 10 10 1 14 5 7 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 2 1 1 2 1 4 1 4 5 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 5 5 2 5 1 5 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 2 1 2 2 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 1.73 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 0.3383 0.7547 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 38 18 38 25 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 2 2 2 1 1 1 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.8664 -0.0415 1.0934 -0.798 0.2612 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 5 3 3 3 3 3 6 3 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 10 10 16 10 10 8 6 ...
## $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.129 1.129 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 3 1 3 3 3 3 3 5 5 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 4 6 6 10 4 4 10 1 12 14 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 1 2 1 6 1 1 1 4 2 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 3 2 5 3 1 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 1 2 2 2 1 2 2 ...
## $ capital_gain : num [1:33916, 1] -0.147 -0.147 -0.147 -0.147 0.543 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -2.3267 -0.0781 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 38 18 38 38 25 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 2 2 2 1 1 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 1.0934 -0.798 -0.1171 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 5 3 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 2 10 13 7 12 13 10 16 ...
## $ education_num : num [1:33916, 1] 1.13 1.13 -1.22 1.13 1.52 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 3 3 3 4 3 5 3 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 10 4 8 4 10 4 4 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 1 6 6 2 1 2 1 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 3 5 5 5 3 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 2 1 2 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 22 38 38 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 2 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 5 3 6 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 7 12 13 10 10 ...
## $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 4 3 5 3 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 8 4 10 10 1 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 2 1 2 1 4 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 3 5 5 2 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 2 1 2 1 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 22 38 38 18 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 1.0934 -0.571 0.2612 -0.1171 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 3 6 3 5 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 2 13 10 16 10 10 12 12 2 ...
## $ education_num : num [1:33916, 1] 1.1287 -1.2215 1.5204 1.1287 -0.0464 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 5 3 3 3 5 5 5 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 6 10 4 4 10 1 5 7 12 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 1 1 4 4 5 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 3 5 5 3 2 5 5 5 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 1 2 2 2 1 2 2 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 1.73 0.543 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 0.7547 -0.0781 3.2531 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 38 18 38 38 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 2 2 2 2 1 1 1 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.8664 -0.0415 -0.798 -0.1171 0.7907 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 5 3 3 3 3 5 3 3 6 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 12 10 13 7 12 13 16 10 6 ...
## $ education_num : num [1:33916, 1] 1.129 -0.438 1.129 1.52 -2.005 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 3 1 3 3 4 3 5 3 3 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 4 6 10 4 8 4 10 4 10 14 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 1 2 6 6 2 1 2 1 1 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 5 3 5 5 3 2 1 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 1 1 1 2 1 2 2 2 ...
## $ capital_gain : num [1:33916, 1] -0.147 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -2.3267 -0.0781 -0.0781 -0.0781 -2.0768 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 5 38 22 38 38 38 18 25 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 2 2 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.1171 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 5 3 3 6 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 13 7 12 13 10 10 ...
## $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.52 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 4 3 5 3 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 4 8 4 10 4 10 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 2 1 2 1 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 5 3 5 5 5 2 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 2 1 2 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 38 22 38 38 38 18 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 2 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 -0.0415 1.0934 -0.1171 -0.571 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 3 6 3 3 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 13 13 10 10 8 6 12 ...
## $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.52 1.52 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 1 3 3 5 3 5 5 3 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 6 6 4 10 10 1 12 14 7 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 2 1 6 2 1 4 2 1 5 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 5 5 2 5 3 1 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 2 1 2 2 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 1.73 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 0.7547 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 38 18 38 38 25 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 2 2 1 1 1 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 3 5 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 13 7 12 10 16 ...
## $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 3 4 3 3 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 4 8 4 4 4 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 6 2 1 1 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 3 5 5 3 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 1 2 2 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 38 22 38 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 1 2 2 2 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.8664 -0.0415 1.0934 -0.798 -0.1171 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 5 3 3 3 3 3 5 3 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 10 13 7 12 10 16 10 ...
## $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.129 1.52 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 3 1 3 3 3 4 3 3 3 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 4 6 6 10 4 8 4 4 4 1 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 1 2 1 6 6 2 1 1 1 4 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 3 5 5 3 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 2 2 2 1 ...
## $ capital_gain : num [1:33916, 1] -0.147 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -2.3267 -0.0781 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 22 38 38 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] -0.0415 1.0934 -0.798 -0.1171 0.7907 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 3 3 3 3 3 5 3 6 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 12 2 10 13 7 12 10 10 10 8 ...
## $ education_num : num [1:33916, 1] -0.438 -1.222 1.129 1.52 -2.005 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 1 3 3 3 4 3 3 3 5 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 6 6 10 4 8 4 4 10 1 12 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 6 6 2 1 1 1 4 2 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 3 3 5 3 5 5 2 5 3 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 1 1 1 2 2 2 1 2 ...
## $ capital_gain : num [1:33916, 1] -0.147 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 -2.0768 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 5 38 22 38 38 18 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 2 1 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 -0.0415 -0.798 -0.1171 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 3 6 3 5 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 10 13 7 16 10 8 12 ...
## $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 1.129 1.52 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 4 3 3 5 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 10 4 8 4 10 12 5 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 6 6 2 1 1 2 4 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 5 3 3 2 3 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 2 2 2 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 22 38 18 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 1 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 5 3 6 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 7 12 10 10 10 ...
## $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 4 3 3 3 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 8 4 4 10 1 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 2 1 1 1 4 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 3 5 5 2 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 2 2 2 1 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 22 38 38 18 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 -0.0415 1.0934 -0.798 -0.571 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 3 3 3 3 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 10 13 10 16 10 8 6 ...
## $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.129 1.52 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 1 3 3 5 3 3 5 5 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 6 6 10 10 4 4 1 12 14 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 2 1 6 2 1 1 4 2 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 5 3 5 3 1 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 2 2 1 2 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 1.73 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 0.7547 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 38 38 38 38 25 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 2 2 2 1 1 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 -0.0415 1.0934 -0.798 -0.1171 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 3 5 3 6 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 10 13 12 10 10 10 8 ...
## $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.129 1.52 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 1 3 3 3 3 3 3 5 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 6 6 10 4 4 4 10 1 12 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 2 1 6 6 1 1 1 4 2 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 5 5 2 5 3 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 2 2 2 1 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 38 38 18 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 2 1 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 3 3 6 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 7 13 16 10 10 ...
## $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 4 5 3 3 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 8 10 4 10 1 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 2 2 1 1 4 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 3 5 3 2 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 1 2 2 1 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 22 38 38 18 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 -0.0415 1.0934 -0.798 -0.1171 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 3 3 5 3 3 6 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 10 13 7 12 13 10 10 ...
## $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.129 1.52 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 1 3 3 3 4 3 5 3 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 6 6 10 4 8 4 10 4 10 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 2 1 6 6 2 1 2 1 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 3 5 5 5 2 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 2 1 2 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 22 38 38 38 18 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 2 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 3 5 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 13 7 12 13 10 ...
## $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 3 4 3 5 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 4 8 4 10 4 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 6 2 1 2 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 3 5 5 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 1 2 1 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 38 22 38 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 1 2 2 2 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 1.0934 -0.1171 1.0177 -0.571 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 3 3 5 3 3 3 5 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 2 13 12 13 10 8 12 12 2 ...
## $ education_num : num [1:33916, 1] 1.129 -1.222 1.52 -0.438 1.52 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 3 3 5 3 5 5 5 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 6 4 4 10 4 12 5 7 12 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 6 1 2 1 2 4 5 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 3 5 5 5 5 3 5 5 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 1 2 1 2 2 2 2 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 1.73 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 0.3383 0.7547 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 38 38 38 38 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 2 2 2 1 1 1 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.1171 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 5 6 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 13 7 12 10 8 6 ...
## $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.52 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 4 3 3 5 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 4 8 4 10 12 14 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 2 1 1 2 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 5 3 5 2 3 1 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 2 2 2 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 38 22 38 18 38 25 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 1 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 -0.0415 1.0934 -0.798 -0.1171 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 3 5 3 3 3 6 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 10 13 12 13 10 16 10 ...
## $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.129 1.52 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 1 3 3 3 3 5 3 3 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 6 6 10 4 4 10 4 4 10 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 2 1 6 6 1 2 1 1 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 5 5 5 3 2 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 2 1 2 2 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 38 38 38 38 18 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 2 2 2 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 1.0934 -0.798 -0.1171 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 3 3 3 6 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 2 10 13 7 13 10 16 10 ...
## $ education_num : num [1:33916, 1] 1.13 1.13 -1.22 1.13 1.52 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 3 3 3 4 5 3 3 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 10 4 8 10 4 4 10 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 1 6 6 2 2 1 1 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 3 5 5 3 2 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 1 2 2 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 22 38 38 38 18 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 2 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 -0.0415 -0.798 -0.1171 0.7907 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 3 5 3 3 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 12 10 13 7 12 13 10 16 10 ...
## $ education_num : num [1:33916, 1] 1.129 -0.438 1.129 1.52 -2.005 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 1 3 3 4 3 5 3 3 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 6 10 4 8 4 10 4 4 1 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 2 6 6 2 1 2 1 1 4 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 5 3 5 5 5 3 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 1 1 1 2 1 2 2 1 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 -2.0768 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 5 38 22 38 38 38 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 2 2 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 -0.0415 -0.798 -0.1171 0.7907 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 3 3 3 6 3 5 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 12 10 13 7 13 10 10 10 12 ...
## $ education_num : num [1:33916, 1] 1.129 -0.438 1.129 1.52 -2.005 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 1 3 3 4 5 3 3 5 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 6 10 4 8 10 4 10 1 5 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 2 6 6 2 2 1 1 4 4 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 5 3 5 5 2 5 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 1 1 1 1 2 2 1 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 -2.0768 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 5 38 22 38 38 18 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 2 1 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 -0.0415 -0.798 -0.1171 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 5 3 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 10 13 7 12 10 16 6 ...
## $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 1.129 1.52 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 4 3 3 3 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 10 4 8 4 4 4 14 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 6 6 2 1 1 1 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 5 3 5 5 3 1 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 2 2 2 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 22 38 38 38 25 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 5 3 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 13 12 13 10 16 ...
## $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 3 3 5 3 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 4 4 10 4 4 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 6 1 2 1 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 5 5 5 3 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 2 1 2 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 38 38 38 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 2 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 -0.0415 -0.798 -0.1171 0.7907 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 3 3 3 6 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 12 10 13 7 13 10 10 8 6 ...
## $ education_num : num [1:33916, 1] 1.129 -0.438 1.129 1.52 -2.005 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 1 3 3 4 5 3 3 5 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 6 10 4 8 10 4 10 12 14 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 2 6 6 2 2 1 1 2 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 5 3 5 5 2 3 1 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 1 1 1 1 2 2 2 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 -2.0768 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 5 38 22 38 38 18 38 25 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 2 1 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 1.093 -0.798 0.791 -0.571 0.261 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 3 3 3 3 3 3 6 3 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 2 10 7 13 10 16 10 10 8 6 ...
## $ education_num : num [1:33916, 1] -1.22 1.13 -2 1.52 1.13 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 3 3 4 5 3 3 3 5 5 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 6 10 8 10 4 4 10 1 12 14 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 1 6 2 2 1 1 1 4 2 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 3 3 3 5 5 3 2 5 3 1 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 1 1 1 2 2 2 1 2 2 ...
## $ capital_gain : num [1:33916, 1] -0.147 -0.147 -0.147 1.73 0.543 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -2.0768 0.7547 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 5 22 38 38 38 18 38 38 25 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 2 2 2 2 1 1 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 -0.0415 1.0934 -0.798 -0.1171 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 3 3 5 3 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 10 13 7 12 13 10 10 ...
## $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.129 1.52 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 1 3 3 3 4 3 5 3 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 6 6 10 4 8 4 10 4 1 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 2 1 6 6 2 1 2 1 4 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 3 5 5 5 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 2 1 2 1 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 22 38 38 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 -0.0415 1.0934 -0.798 -0.1171 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 3 3 5 3 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 10 13 7 12 10 16 10 ...
## $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.129 1.52 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 1 3 3 3 4 3 3 3 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 6 6 10 4 8 4 4 4 1 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 2 1 6 6 2 1 1 1 4 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 3 5 5 3 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 2 2 2 1 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 22 38 38 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 -0.0415 -0.1171 0.7907 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 5 3 3 3 6 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 13 7 12 13 10 16 10 ...
## $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 1.52 -2.005 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 4 3 5 3 3 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 4 8 4 10 4 4 10 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 6 2 1 2 1 1 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 5 3 5 5 5 3 2 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 2 1 2 2 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -2.0768 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 22 38 38 38 38 18 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 2 2 2 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 -0.798 -0.1171 -0.571 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 3 3 3 5 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 10 13 13 10 16 10 8 12 ...
## $ education_num : num [1:33916, 1] 1.13 1.13 1.13 1.52 1.52 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 3 3 5 3 3 5 5 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 10 4 10 4 4 1 12 5 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 6 6 2 1 1 4 2 4 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 5 5 5 3 5 3 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 1 1 1 2 2 1 2 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 1.73 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 0.7547 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 5 38 38 38 38 38 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 2 2 2 1 1 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.8664 -0.0415 -0.798 0.7907 1.0177 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 5 3 3 3 5 3 3 3 6 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 12 10 7 12 13 10 16 10 10 ...
## $ education_num : num [1:33916, 1] 1.129 -0.438 1.129 -2.005 -0.438 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 3 1 3 4 3 5 3 3 3 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 4 6 10 8 4 10 4 4 10 1 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 1 2 6 2 1 2 1 1 1 4 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 5 5 3 2 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 1 1 2 1 2 2 2 1 ...
## $ capital_gain : num [1:33916, 1] -0.147 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -2.3267 -0.0781 -0.0781 -2.0768 0.3383 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 5 22 38 38 38 38 18 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 2 2 2 2 2 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 -0.0415 1.0934 -0.1171 0.7907 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 3 3 3 6 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 13 7 10 16 10 10 8 ...
## $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.52 -2.005 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 1 3 3 4 3 3 3 5 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 6 6 4 8 4 4 10 1 12 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 2 1 6 2 1 1 1 4 2 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 5 3 5 3 2 5 3 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 2 2 2 1 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 -2.0768 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 22 38 38 18 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 2 1 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 -0.0415 1.0934 -0.798 -0.1171 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 3 3 5 3 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 10 13 7 12 13 10 10 ...
## $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.129 1.52 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 1 3 3 3 4 3 5 3 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 6 6 10 4 8 4 10 4 1 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 2 1 6 6 2 1 2 1 4 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 3 5 5 5 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 2 1 2 1 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 22 38 38 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 -0.798 -0.1171 0.7907 1.0177 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 5 3 3 3 6 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 13 7 12 13 10 16 10 10 ...
## $ education_num : num [1:33916, 1] 1.129 1.129 1.52 -2.005 -0.438 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 3 4 3 5 3 3 3 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 10 4 8 4 10 4 4 10 1 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 6 6 2 1 2 1 1 1 4 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 3 5 3 5 5 5 3 2 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 1 1 1 2 1 2 2 2 1 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -2.0768 0.3383 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 5 38 22 38 38 38 38 18 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 2 2 2 2 2 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.8664 -0.0415 -0.798 -0.1171 0.7907 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 5 3 3 3 3 5 3 3 6 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 12 10 13 7 12 13 16 10 8 ...
## $ education_num : num [1:33916, 1] 1.129 -0.438 1.129 1.52 -2.005 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 3 1 3 3 4 3 5 3 3 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 4 6 10 4 8 4 10 4 10 12 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 1 2 6 6 2 1 2 1 1 2 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 5 3 5 5 3 2 3 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 1 1 1 2 1 2 2 2 ...
## $ capital_gain : num [1:33916, 1] -0.147 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -2.3267 -0.0781 -0.0781 -0.0781 -2.0768 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 5 38 22 38 38 38 18 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 2 2 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.1171 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 5 3 3 6 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 13 7 12 10 16 10 ...
## $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.52 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 4 3 3 3 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 4 8 4 4 4 10 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 2 1 1 1 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 5 3 5 5 3 2 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 2 2 2 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 38 22 38 38 38 18 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 2 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.8664 -0.0415 -0.798 -0.1171 1.0177 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 5 3 3 3 5 3 3 3 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 12 10 13 12 13 10 16 8 6 ...
## $ education_num : num [1:33916, 1] 1.129 -0.438 1.129 1.52 -0.438 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 3 1 3 3 3 5 3 3 5 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 4 6 10 4 4 10 4 4 12 14 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 1 2 6 6 1 2 1 1 2 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 5 5 5 5 3 3 1 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 1 1 2 1 2 2 2 2 ...
## $ capital_gain : num [1:33916, 1] -0.147 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -2.3267 -0.0781 -0.0781 -0.0781 0.3383 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 5 38 38 38 38 38 38 25 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 2 2 2 2 1 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 1.0177 -0.571 -0.1171 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 5 3 3 6 3 3 3 5 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 13 16 10 10 8 6 12 ...
## $ education_num : num [1:33916, 1] 1.1287 1.1287 -0.4381 1.5204 -0.0464 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 3 5 3 3 5 5 3 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 4 10 4 10 1 12 14 5 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 1 2 1 1 4 2 1 4 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 5 3 2 5 3 1 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 1 2 2 1 2 2 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 1.73 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 0.3383 0.7547 3.2531 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 38 18 38 38 25 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 2 2 2 2 1 1 1 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 -0.798 0.7907 1.0177 -0.571 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 3 3 5 3 3 6 3 3 5 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 7 12 13 16 10 10 8 12 ...
## $ education_num : num [1:33916, 1] 1.129 1.129 -2.005 -0.438 1.52 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 4 3 5 3 3 5 5 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 10 8 4 10 4 10 1 12 5 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 6 2 1 2 1 1 4 2 4 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 3 3 5 5 3 2 5 3 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 1 1 2 1 2 2 1 2 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 1.73 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -2.0768 0.3383 0.7547 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 5 22 38 38 38 18 38 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 2 2 2 2 1 1 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 1.0934 -0.798 -0.1171 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 5 3 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 2 10 13 7 12 13 10 16 ...
## $ education_num : num [1:33916, 1] 1.13 1.13 -1.22 1.13 1.52 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 3 3 3 4 3 5 3 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 10 4 8 4 10 4 4 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 1 6 6 2 1 2 1 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 3 5 5 5 3 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 2 1 2 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 22 38 38 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 2 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0177 -0.571 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 5 3 3 6 3 5 5 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 12 13 16 10 8 12 13 ...
## $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -0.438 1.52 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 5 3 3 5 5 1 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 4 10 4 10 12 5 4 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 2 1 1 2 4 5 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 5 5 3 2 3 5 5 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 2 2 2 2 1 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 1.73 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 0.3383 0.7547 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 38 38 18 38 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 2 2 2 2 1 1 2 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 3 5 3 6 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 13 7 12 10 10 ...
## $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 3 4 3 3 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 4 8 4 4 10 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 6 2 1 1 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 3 5 5 2 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 1 2 2 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 38 22 38 38 18 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 1 2 2 2 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 3 5 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 13 7 12 10 16 ...
## $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 3 4 3 3 3 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 4 8 4 4 4 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 6 2 1 1 1 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 3 5 5 3 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 1 2 2 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 38 22 38 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 1 2 2 2 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame': 33916 obs. of 14 variables:
## $ age : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.1171 ...
## $ workclass : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 5 3 6 3 3 ...
## $ education : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 13 12 13 10 10 8 ...
## $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.52 ...
## $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 3 5 3 5 5 ...
## $ occupation : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 4 4 10 10 1 12 ...
## $ relationship : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 1 2 1 4 2 ...
## $ race : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 5 5 5 2 5 3 ...
## $ sex : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 2 1 2 1 2 ...
## $ capital_gain : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
## $ capital_loss : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
## $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
## $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 38 38 38 18 38 38 ...
## $ salary : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 2 1 1 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
## .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
mean(sensitivityLM)
## [1] 0.8763368
mean(specificityLM)
## [1] 0.7327439
mean(errorLM)
## [1] 0.1529889
After splitting data multiple times with training and test the logistic regression shows that the sensitivity is around 88%. specificity is 73% and accuracy is 85%.
Comparison with RandomForect and SVM is below in subproblem 5 below.
Develop random forest model of the categorized income. Present variable importance plots and comment on relative importance of different attributes in the model. Did attributes showing up as more important in random forest model also appear as significantly associated with the outcome by logistic regression? Test model performance on multiple splits of data into training and test subsets, compare test and out-of-bag error estimates, summarize model performance in terms of accuracy/error, sensitivity/specificity and compare to the performance of other methods reported in the dataset description.
# Random forest on whole data
rfOutput <- randomForest(factor(salary)~., importance=TRUE,data=noNAData)
# variable(s) importance plot
varImpPlot(rfOutput)
plot(rfOutput)
legend("top", colnames(rfOutput$err.rate),col=1:6,cex=0.8,fill=1:6)
# test model performance with Random forest
errorRF<-numeric(100)
errorRFmTry<-numeric(100)
sensitivityRF<-numeric(100)
specificityRF<-numeric(100)
# put all these in a loop
for ( iTry in 1:100 ) {
bTrain <- sample(c(FALSE,TRUE),nrow(noNAData),replace=TRUE)
rfRes <- randomForest(factor(salary)~., importance=TRUE,data=noNAData[bTrain,])
rfTbl <- table(factor(noNAData[!bTrain,]$salary),predict(rfRes,newdata=noNAData[!bTrain,]))
rfResmTRy <- randomForest(factor(salary)~., importance=TRUE,data=noNAData[bTrain,],mtry=5)
rfTblmTry <- table(factor(noNAData[!bTrain,]$salary),predict(rfResmTRy,newdata=noNAData[!bTrain,]))
cm<-confusionMatrix(rfTbl)
cmmTRy<-confusionMatrix(rfTblmTry)
sensitivityRF[iTry]<-cm$byClass['Sensitivity']
specificityRF[iTry]<-cm$byClass['Specificity']
overall <- cm$overall
overall.accuracy <- overall['Accuracy']
errorRF[iTry]<-1-overall.accuracy
overall1 <- cm$overall
overall.accuracy1 <- overall1['Accuracy']
errorRFmTry[iTry]<-1-overall.accuracy1
#cm
}
mean(sensitivityRF)
## [1] 0.807315
mean(specificityRF)
## [1] 0.9578385
mean(errorRF)
## [1] 0.1818734
mean(errorRFmTry)
## [1] 0.1818734
Variable Importance plots show that the capital_gain,capitol_loss,marital_status are important. The “MeanDecreaseAccuracy” is the mean decrease of accuracy over all out-of-bag cross validated predictions,
“MeanDecreaseGini” measures the average gain of purity by splits of a given variable. For this data capital_gain,relationship, and age.
The rfOutput shows that class <=50K OOB and >50K behave the same way around 50 decision trees.
After splitting data multiple times with training and test the Random Forest shows that the sensitivity is around 81%. specificity is 96% and accuracy is 82%.
Comparison with Logistic Regression and SVM is below in subproblem 5 below.
Develop SVM model of this data choosing parameters (e.g. choice of kernel, cost, etc.) that appear to yield better performance. Test model performance on multiple splits of data into training and test subsets, summarize model performance in terms of accuracy/error, sensitivity/specificity and compare to the performance of other methods reported in the dataset description.
# run tuning on SVM on the whole data once to get optimal values of cost & gamma
# working on a subset as the whole data is taking a lot of time
tune.out=tune(svm,as.factor(salary) ~ .,data=noNAData,kernel="radial",ranges=list(cost=c( 1,2,5,10,20, 100),gamma=c(0.01,0.02,0.05,0.1,0.2)),scale = FALSE)
cValue<-tune.out$best.parameters$cost
gValue<-tune.out$best.parameters$gamma
#run the SVM
svmfit=svm(as.factor(salary) ~ ., data=noNAData, kernel="radial",cost=cValue,gamma=gValue)
summary(svmfit)
##
## Call:
## svm(formula = as.factor(salary) ~ ., data = noNAData, kernel = "radial",
## cost = cValue, gamma = gValue)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 2
## gamma: 0.2
##
## Number of Support Vectors: 16514
##
## ( 9132 7382 )
##
##
## Number of Classes: 2
##
## Levels:
## <=50K >50K
sensitivitySVM<-numeric(100)
specificitySVM<-numeric(100)
errorSVM<-numeric(100)
smp_size <- floor(0.80 * nrow(noNAData))
# put all these in a loop
for ( iTry in 1:100 ) {
train_ind <- sample(seq_len(nrow(noNAData)), size = smp_size,replace = TRUE)
train <- noNAData[train_ind, ]
test <- noNAData[-train_ind, ]
tune.out=tune(svm,as.factor(salary) ~ .,data=train,kernel="radial",ranges=list(cost=cValue,gamma=gValue),scale = FALSE)
bestmod=tune.out$best.model
pOut<-predict(bestmod,test[,-14])
cValue<-tune.out$best.parameters$cost
gValue<-tune.out$best.parameters$gamma
tbl<-table(predict=pOut, truth=test[,14])
misCal<-1-(tbl[1,1]+tbl[2,2])/sum(tbl)
cm<-confusionMatrix(tbl)
cmmTRy<-confusionMatrix(rfTblmTry)
sensitivitySVM[iTry]<-cm$byClass['Sensitivity']
specificitySVM[iTry]<-cm$byClass['Specificity']
overall <- cm$overall
overall.accuracy <- overall['Accuracy']
errorSVM[iTry]<-1-overall.accuracy
}
mean(sensitivitySVM)
## [1] 0.928466
mean(specificitySVM)
## [1] 0.6135384
mean(errorSVM)
## [1] 0.1496539
SVM analysis was taking very long time so only 1000 observation are selected
After splitting data multiple times with training and test the SVM shows that the sensitivity is around 94%. specificity is 58% and accuracy is 85%.
Comparison with Logistic Regression and Random Forest is below in subproblem 5 below. ***
Compare performance of the models developed above (logistic regression, random forest, SVM) in terms of their accuracy, error and sensitivity/specificity. Comment on differences and similarities between them.
#boxplots
#Sensitivity box plots on RF,SVM,LR models
boxplot(list(LG=sensitivityLM,RF=sensitivityRF,SVM=sensitivitySVM))
# Error on RF,SVM,LR models
boxplot(list(LG=errorLM,RF=errorSVM,SVM=errorSVM,RFOOB=errorRFmTry))
# specificity on RF,SVM,LR models
boxplot(list(LG=specificityLM,RF=specificityRF,SVM=specificitySVM))
The Box plots above show the comparison of Logistic Regression (LR), Random Forest(RF) and SVM for error sensitivity,specificity and error.
Sensitivity
SVM is more sensitive out of RF and LR followed by LR. RF is the lowest.
Accuracy
Random Forest is more accurate than RF,LR and SVM. All the three RF,LR and SVM have almost the same accuracy.
specificity
RF has more specificity followed by LR and then SVM
Develop KNN model for this data, evaluate its performance for different values of \(k\) on different splits of the data into training and test and compare it to the performance of other methods reported in the dataset description. Notice that this dataset includes many categorical variables as well as continuous attributes measured on different scales, so that the distance has to be defined to be meaningful (probably avoiding subtraction of the numerical values of multi-level factors directly or adding differences between untransformed age and capital gain/loss attributes).
# KNN cross done after converting categorical variables to numeric
knn.cross <- tune.knn(x = noNAData.num[,-14], y = as.factor(noNAData.num[,14]), k = 1:50,tunecontrol=tune.control(sampling = "cross"), cross=10)
#Summarize the resampling results set
summary(knn.cross)
##
## Parameter tuning of 'knn.wrapper':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## k
## 18
##
## - best performance: 0.1657858
##
## - Detailed performance results:
## k error dispersion
## 1 1 0.1967448 0.003495239
## 2 2 0.1973417 0.005436267
## 3 3 0.1782578 0.002864099
## 4 4 0.1787001 0.004112433
## 5 5 0.1730169 0.004224332
## 6 6 0.1714248 0.003880075
## 7 7 0.1685057 0.003546059
## 8 8 0.1687268 0.004870304
## 9 9 0.1681961 0.003911997
## 10 10 0.1692133 0.005270897
## 11 11 0.1675106 0.003904686
## 12 12 0.1666261 0.003966829
## 13 13 0.1664712 0.003938735
## 14 14 0.1675770 0.003261339
## 15 15 0.1660511 0.004744933
## 16 16 0.1668029 0.004101612
## 17 17 0.1663607 0.004378648
## 18 18 0.1657858 0.004135437
## 19 19 0.1659848 0.004606541
## 20 20 0.1658522 0.004122050
## 21 21 0.1666262 0.004269018
## 22 22 0.1665819 0.003631125
## 23 23 0.1662281 0.003983717
## 24 24 0.1658301 0.004034915
## 25 25 0.1660733 0.004368836
## 26 26 0.1663608 0.004120839
## 27 27 0.1667146 0.003650546
## 28 28 0.1668251 0.004033352
## 29 29 0.1662281 0.004150954
## 30 30 0.1663165 0.004011329
## 31 31 0.1669799 0.004394028
## 32 32 0.1670905 0.004786991
## 33 33 0.1666924 0.004539544
## 34 34 0.1669357 0.005398175
## 35 35 0.1670462 0.004746109
## 36 36 0.1669135 0.004760006
## 37 37 0.1669578 0.004600875
## 38 38 0.1676212 0.004693426
## 39 39 0.1671126 0.004253952
## 40 40 0.1671790 0.004088081
## 41 41 0.1673780 0.004337776
## 42 42 0.1676213 0.004659308
## 43 43 0.1680636 0.005465432
## 44 44 0.1675328 0.005080343
## 45 45 0.1674886 0.005047222
## 46 46 0.1671789 0.005152738
## 47 47 0.1680635 0.004610312
## 48 48 0.1684394 0.005133739
## 49 49 0.1684615 0.005165149
## 50 50 0.1683731 0.005144078
plot(knn.cross)
knn.cross$best.parameters
## k
## 18 18
#Resampling using bootstraping on full data set
knn.boot <- tune.knn(x = noNAData.num[,-14], y = as.factor(noNAData.num[,14]), k = 1:50,tunecontrol=tune.control(sampling = "boot") )
#Summarize the resampling results set
summary(knn.boot)
##
## Parameter tuning of 'knn.wrapper':
##
## - sampling method: bootstrapping
##
## - best parameters:
## k
## 25
##
## - best performance: 0.1714755
##
## - Detailed performance results:
## k error dispersion
## 1 1 0.2016709 0.002422931
## 2 2 0.2029142 0.002264352
## 3 3 0.1977014 0.002008570
## 4 4 0.1945068 0.002525054
## 5 5 0.1884362 0.003111751
## 6 6 0.1853747 0.003288427
## 7 7 0.1817465 0.002807767
## 8 8 0.1803274 0.003520797
## 9 9 0.1787451 0.002952522
## 10 10 0.1780873 0.002814019
## 11 11 0.1767073 0.002983977
## 12 12 0.1759795 0.003140198
## 13 13 0.1752343 0.002673389
## 14 14 0.1755010 0.002525206
## 15 15 0.1741370 0.002931879
## 16 16 0.1738447 0.002871186
## 17 17 0.1727826 0.003006141
## 18 18 0.1731727 0.003040984
## 19 19 0.1726582 0.002688510
## 20 20 0.1721918 0.002458680
## 21 21 0.1722126 0.002193793
## 22 22 0.1723376 0.002601168
## 23 23 0.1721416 0.002631828
## 24 24 0.1721224 0.002853456
## 25 25 0.1714755 0.002636775
## 26 26 0.1722034 0.002282562
## 27 27 0.1718445 0.002610336
## 28 28 0.1719802 0.002455831
## 29 29 0.1718078 0.002448981
## 30 30 0.1716327 0.002396892
## 31 31 0.1716822 0.002487311
## 32 32 0.1717959 0.002553038
## 33 33 0.1715732 0.002383614
## 34 34 0.1715078 0.002419299
## 35 35 0.1718124 0.002129954
## 36 36 0.1718835 0.002259084
## 37 37 0.1715404 0.002050448
## 38 38 0.1719209 0.002037610
## 39 39 0.1721002 0.002163424
## 40 40 0.1719425 0.002022634
## 41 41 0.1717959 0.002073330
## 42 42 0.1718450 0.001768809
## 43 43 0.1718935 0.002089969
## 44 44 0.1716652 0.002381223
## 45 45 0.1721390 0.002356919
## 46 46 0.1721610 0.002248196
## 47 47 0.1720521 0.002178913
## 48 48 0.1722910 0.002329793
## 49 49 0.1721170 0.002284532
## 50 50 0.1718510 0.002154651
plot(knn.boot)
knn.boot$best.parameters
## k
## 25 25
#Splitting K values
smp_size <- floor(0.80 * nrow(noNAData.num))
train_ind <- sample(seq_len(nrow(noNAData.num)), size = smp_size)
knntrain <- noNAData.num[train_ind, ]
knntest <- noNAData.num[-train_ind, ]
misCalk<-vector()
kValues<-vector()
for (x1 in 1:50){
knn1.pred <- tune.knn(x = knntrain[,-14],y = as.factor(knntrain[,14]),k = 1:50)
kValues[x1]<-knn1.pred$best.parameters
knnOutput<- knn(train = knntrain[,-14],test = knntest[,-14],cl = as.factor(knntrain[,14]),k =knn1.pred$best.parameters)
knn1Tbl<- table(knnOutput,as.factor(knntest[,14]))
misCalk[x1]<-1-(knn1Tbl[1,1]+knn1Tbl[2,2])/sum(knn1Tbl)
}
#misCalk
# Mean of the errors
mean(misCalk)
## [1] 0.1688027
plot(x=kValues,y=misCalk)
For bootstrap K=26 is the optimal K Value. but for cross validation optimal K value is 19.
After splitting and tuning the K-value several we can see that minimal error is for k=12 & 18.
SVM does not appear to provide readily available tools for judging relative importance of different attributes in the model. Please evaluate here an approach similar to that employed by random forest where importance of any given attribute is measured by the decrease in model performance upon randomization of the values for this attribute.
dmy <- dummyVars(" ~ .", data = noNAData, fullRank=T)
trsf <- data.frame(predict(dmy, newdata = noNAData))
#anyNA(trsf)
#split the data into traning and test
splitIndex <- sample(nrow(trsf), floor(0.5*nrow(trsf)))
trainDF <- trsf[ splitIndex,]
testDF <- trsf[-splitIndex,]
outcomeName <- 'salary...50K'
predictorNames <- setdiff(names(trainDF),outcomeName)
# transform outcome variable to text as this is required in caret for classification
#>50K=TRUE and <=50K=FALSE
#trainDF[,outcomeName] <- ifelse(trainDF[,outcomeName]==" <=50K",1,2)
#trainDF[,outcomeName] <- ifelse(trainDF[,outcomeName]==" <=50K","<=50K",">50K")
#trainDF[,outcomeName] <- ifelse(trainDF[,outcomeName]==" <=50K",1,2)
# trainDF[,outcomeName] <- ifelse(trainDF[,outcomeName]==0,"lessthan50K","greaterthank50K")
trainDF[,outcomeName] <- ifelse(trainDF[,outcomeName]==0,"lessthan50K","greaterthank50K")
testDF[,outcomeName] <- ifelse(testDF[,outcomeName]==0,"lessthan50K","greaterthank50K")
trainDF1=na.omit(trainDF)
#trctrl <- trainControl(method = "repeatedcv", classProbs=TRUE, returnResamp='none',summaryFunction=twoClassSummary,repeats = 3)
#svm.tune <- train(x=trainDF[,predictorNames],y= as.factor(trainDF[,outcomeName]),method = "svmRadial",tuneLength = 10,preProc = c("center","scale"), metric="ROC",trControl=trctrl)
trctrl <- trainControl(method = "repeatedcv", classProbs = TRUE,number=10,repeats = 3)
svm.tune <- train(salary...50K~.,data=trainDF,method = "svmRadial",tuneLength = 10,preProc = c("center","scale"),trControl=trctrl)
##
## Attaching package: 'kernlab'
## The following object is masked from 'package:ggplot2':
##
## alpha
predictions <- predict(object=svm.tune, testDF[,predictorNames], type='prob')
# This is taken from stackoverflow as provided by the link
GetROC_AUC = function(probs, true_Y){
# AUC approximation
# http://stackoverflow.com/questions/4903092/calculate-auc-in-r
# ty AGS
probsSort = sort(probs, decreasing = TRUE, index.return = TRUE)
val = unlist(probsSort$x)
idx = unlist(probsSort$ix)
roc_y = true_Y[idx];
stack_x = cumsum(roc_y == 1)/sum(roc_y == 1)
stack_y = cumsum(roc_y == 2)/sum(roc_y == 2)
auc = sum((stack_x[2:length(roc_y)]-stack_x[1:length(roc_y)-1])*stack_y[2:length(roc_y)])
return(auc)
}
testOutcome <- ifelse(testDF[,outcomeName]=="lessthan50K",1,2)
refAUC <- GetROC_AUC(predictions[[1]],testOutcome )
print(paste('AUC score:', refAUC))
## [1] "AUC score: 0.779934704501859"
# Shuffle predictions for variable importance
AUCShuffle <- NULL
shuffletimes <- 10
featuresMeanAUCs <- c()
for (feature in predictorNames) {
featureAUCs <- c()
shuffledData <- testDF[,predictorNames]
for (iter in 1:shuffletimes) {
shuffledData[,feature]<-sample(shuffledData[,feature],length(shuffledData[,feature]))
predictions <- predict(object=svm.tune, shuffledData[,predictorNames], type='prob')
featureAUCs <- c(featureAUCs,GetROC_AUC(predictions[[1]], testDF[,outcomeName]))
}
featuresMeanAUCs <- c(featuresMeanAUCs, mean(featureAUCs < refAUC))
}
AUCShuffle <- data.frame('feature'=predictorNames, 'importance'=featuresMeanAUCs)
AUCShuffle <- AUCShuffle[order(AUCShuffle$importance, decreasing=TRUE),]
print(AUCShuffle)
## feature importance
## 1 age NA
## 2 workclass..Local.gov NA
## 3 workclass..Private NA
## 4 workclass..Self.emp.inc NA
## 5 workclass..Self.emp.not.inc NA
## 6 workclass..State.gov NA
## 7 workclass..Without.pay NA
## 8 education..11th NA
## 9 education..12th NA
## 10 education..1st.4th NA
## 11 education..5th.6th NA
## 12 education..7th.8th NA
## 13 education..9th NA
## 14 education..Assoc.acdm NA
## 15 education..Assoc.voc NA
## 16 education..Bachelors NA
## 17 education..Doctorate NA
## 18 education..HS.grad NA
## 19 education..Masters NA
## 20 education..Preschool NA
## 21 education..Prof.school NA
## 22 education..Some.college NA
## 23 education_num NA
## 24 marital_status..Married.AF.spouse NA
## 25 marital_status..Married.civ.spouse NA
## 26 marital_status..Married.spouse.absent NA
## 27 marital_status..Never.married NA
## 28 marital_status..Separated NA
## 29 marital_status..Widowed NA
## 30 occupation..Armed.Forces NA
## 31 occupation..Craft.repair NA
## 32 occupation..Exec.managerial NA
## 33 occupation..Farming.fishing NA
## 34 occupation..Handlers.cleaners NA
## 35 occupation..Machine.op.inspct NA
## 36 occupation..Other.service NA
## 37 occupation..Priv.house.serv NA
## 38 occupation..Prof.specialty NA
## 39 occupation..Protective.serv NA
## 40 occupation..Sales NA
## 41 occupation..Tech.support NA
## 42 occupation..Transport.moving NA
## 43 relationship..Not.in.family NA
## 44 relationship..Other.relative NA
## 45 relationship..Own.child NA
## 46 relationship..Unmarried NA
## 47 relationship..Wife NA
## 48 race..Asian.Pac.Islander NA
## 49 race..Black NA
## 50 race..Other NA
## 51 race..White NA
## 52 sex..Male NA
## 53 capital_gain NA
## 54 capital_loss NA
## 55 hours_per_week NA
## 56 native_country..Canada NA
## 57 native_country..China NA
## 58 native_country..Columbia NA
## 59 native_country..Cuba NA
## 60 native_country..Dominican.Republic NA
## 61 native_country..Ecuador NA
## 62 native_country..El.Salvador NA
## 63 native_country..England NA
## 64 native_country..France NA
## 65 native_country..Germany NA
## 66 native_country..Greece NA
## 67 native_country..Guatemala NA
## 68 native_country..Haiti NA
## 69 native_country..Honduras NA
## 70 native_country..Hong NA
## 71 native_country..Hungary NA
## 72 native_country..India NA
## 73 native_country..Iran NA
## 74 native_country..Ireland NA
## 75 native_country..Italy NA
## 76 native_country..Jamaica NA
## 77 native_country..Japan NA
## 78 native_country..Laos NA
## 79 native_country..Mexico NA
## 80 native_country..Nicaragua NA
## 81 native_country..Outlying.US.Guam.USVI.etc. NA
## 82 native_country..Peru NA
## 83 native_country..Philippines NA
## 84 native_country..Poland NA
## 85 native_country..Portugal NA
## 86 native_country..Puerto.Rico NA
## 87 native_country..Scotland NA
## 88 native_country..South NA
## 89 native_country..Taiwan NA
## 90 native_country..Thailand NA
## 91 native_country..Trinadad.Tobago NA
## 92 native_country..United.States NA
## 93 native_country..Vietnam NA
## 94 native_country..Yugoslavia NA
RocImp <- filterVarImp(x = noNAData.bk[, -ncol(noNAData.bk)], y = noNAData.bk$salary)
head(RocImp)
## X...50K X..50K
## age 0.6819298 0.6819298
## workclass 0.5136817 0.5136817
## education 0.5212537 0.5212537
## education_num 0.7140418 0.7140418
## marital_status 0.6421344 0.6421344
## occupation 0.5366588 0.5366588
In Extra 15 points: variable importance in SVM subset of data was done to speed up the submission and to reduce the processing time
Also the Shuffle predictions for variable importance repetitions are reduced from around 500 to 10 . These could effect the out of important variables.
The variable importance algorithinm was adapted from another variable importace steps provided at http://amunategui.github.io/variable-importance-shuffler/ . To compare i used filterVarImp.
The important variable list are arranged in decreasing order of importance.
So we can see that for SVM the following are the important variables. ##1 age
## 2 workclass..Local.gov
## 3 workclass..Private
## 4 workclass..Self.emp.inc
## 5 workclass..Self.emp.not.inc
## 6 workclass..State.gov
## 7 workclass..Without.pay
## 8 education..11th
## 9 education..12th
## 10 education..1st.4th
## 11 education..5th.6th
## 12 education..7th.8th
## 13 education..9th
For Random Forest the first 5 important variables are : capital gain ,capital loss,marital status,occupation and age
The ROC curves of important independent variables w.r.t salary (independent variable) are : age
workclass
education
education_num
marital_status and occupation
***